Movatterモバイル変換


[0]ホーム

URL:


CN118713865B - Method and system for detecting abnormal behavior of application software based on network communication similarity - Google Patents

Method and system for detecting abnormal behavior of application software based on network communication similarity
Download PDF

Info

Publication number
CN118713865B
CN118713865BCN202410698083.XACN202410698083ACN118713865BCN 118713865 BCN118713865 BCN 118713865BCN 202410698083 ACN202410698083 ACN 202410698083ACN 118713865 BCN118713865 BCN 118713865B
Authority
CN
China
Prior art keywords
software
behavior
behavior chain
similarity
vector
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202410698083.XA
Other languages
Chinese (zh)
Other versions
CN118713865A (en
Inventor
罗彩云
张海霞
陈志飞
周邵文
彭媛媛
金俊峰
刘祥宾
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Software of CAS
Original Assignee
Institute of Software of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Software of CASfiledCriticalInstitute of Software of CAS
Priority to CN202410698083.XApriorityCriticalpatent/CN118713865B/en
Publication of CN118713865ApublicationCriticalpatent/CN118713865A/en
Application grantedgrantedCritical
Publication of CN118713865BpublicationCriticalpatent/CN118713865B/en
Activelegal-statusCriticalCurrent
Anticipated expirationlegal-statusCritical

Links

Classifications

Landscapes

Abstract

Translated fromChinese

本发明公开一种基于网络通信相似度的应用软件异常行为检测方法及系统,属于网络安全技术领域。所述方法包括:根据数据包特征进行数据包分组,并将每组数据包构建为一条软件行为链;生成每一软件行为链的描述行为链统计特征的特征向量;基于该描述行为链统计特征的特征向量与基线模型中的每个簇的中心向量的相似度,得到该软件行为链的异常行为检测结果;其中,所述基线模型基于正常的应用软件通信流量数据构建。本发明可以更准确地观察到应用软件网络通信行为的异常。

The present invention discloses a method and system for detecting abnormal behavior of application software based on network communication similarity, belonging to the field of network security technology. The method comprises: grouping data packets according to data packet characteristics, and constructing each group of data packets into a software behavior chain; generating a feature vector describing the statistical characteristics of the behavior chain for each software behavior chain; obtaining the abnormal behavior detection result of the software behavior chain based on the similarity between the feature vector describing the statistical characteristics of the behavior chain and the center vector of each cluster in the baseline model; wherein the baseline model is constructed based on normal application software communication flow data. The present invention can more accurately observe the abnormality of the network communication behavior of the application software.

Description

Method and system for detecting abnormal behavior of application software based on network communication similarity
Technical Field
The invention relates to the technical field of network security, in particular to a method and a system for detecting abnormal behaviors of application software based on network communication similarity.
Background
Software communication behavior anomaly detection is an important technology in the field of network security. With the rapid development of information technology, network attack means are continuously evolved, security threats are continuously generalized, and traditional network security protection means cannot meet the increasingly urgent network security protection requirements. The security protection system based on access authorization is very easy to bypass by novel attack, the security protection system based on feature detection needs continuous iterative updating of rules, the traditional network security protection means is difficult to effectively protect novel attack in time, and unknown attack perception expectation is lacking. Therefore, research on how to introduce an automatic and intelligent method to detect abnormal behaviors in a network and timely discover new or unknown attacks becomes one of the hot problems in the current network security field research.
Innovations in artificial intelligence and machine learning technology have prompted the field of software communication behavioral anomaly detection to exhibit some new research trends. For example, the method utilizes a deep learning algorithm to analyze and model large-scale software flow data, greatly shortens detection time due to the increase of coverage calculation space, achieves a practical category in timeliness and effectiveness, improves the accuracy of anomaly detection discovery based on baseline behavior modeling of machine learning, and greatly improves the accuracy and timeliness of detection while improving the intellectualization and applicability of anomaly detection by a behavior analysis and pattern recognition combined behavior anomaly analysis system. However, software communication behavior anomaly detection still faces challenges such as large data volume, complex features, low algorithm efficiency, and the like.
Disclosure of Invention
In order to observe the abnormity of the network communication behavior of the application software, the invention provides a network communication similarity-based method and a network communication similarity-based system for detecting the abnormity of the application software, which acquire the network communication behavior of the application software according to the characteristics of network data packets to calculate the behavior similarity of the network communication behavior of the application software.
In order to achieve the above object, the present invention provides the following.
An application software abnormal behavior detection method based on network communication similarity, the method comprising:
Grouping the data packets according to the characteristics of the data packets, and constructing each group of data packets into a software behavior chain;
generating a feature vector of each software behavior chain describing statistical features of the behavior chain;
and obtaining an abnormal behavior detection result of the software behavior chain based on the similarity between the feature vector describing the statistical features of the behavior chain and the central vector of each cluster in a baseline model, wherein the baseline model is constructed based on normal application software communication flow data.
Further, the grouping of the data packets according to the data packet characteristics, and constructing each group of data packets as a software behavior chain, includes:
receiving data packets of network traffic by using a sliding window;
Extracting the data packet characteristics of each data packet, wherein the data packet characteristics comprise a protocol type, a departure time, an arrival time, a source address, a destination address, a source port, a destination port, a payload, a software signature, a data packet length, TCP handshake metadata, a total number of bytes, a maximum value/minimum value/average value/standard deviation of a packet length, a maximum value/minimum value/average value/standard deviation of a packet interval time and an information entropy;
grouping data packets according to the software signature, the protocol type, the source address, the source port, the destination address, the destination port and the payload;
and sequencing the grouped data packets according to the departure time to obtain a corresponding software behavior chain.
Further, the grouping of the data packets according to the software signature, the protocol type, the source address, the source port, the destination address, the destination port and the payload includes:
acquiring a data packet grouping strategy, wherein the data packet grouping strategy comprises the following steps:
Data packets with the same software signature are divided into a group;
And, a step of, in the first embodiment,
For data packets with the protocol type of TCP, dividing the data packets from the same TCP connection into a group;
And, a step of, in the first embodiment,
Data packets with the same protocol type, the same source address, the same source port, the same destination address and the same destination port are divided into a group;
And, a step of, in the first embodiment,
Grouping data packets having the same specific file type or keyword in the payload into a group;
and dividing the data packets into a group under the condition that the data packet characteristics of at least two data packets meet the grouping strategy.
Further, the generating the feature vector describing the statistical feature of the behavior chain of each software behavior chain includes:
Acquiring vector representations xi of all data packets in a software behavior chain;
obtaining a vector representation si corresponding to the vector representation xi through self-attention calculation of analysis of the feature correlation of the data packet;
Combining vector representations si corresponding to the software behavior chains, regularizing the combined results in batches, and obtaining vector representations corresponding to the combined results through self-attention calculation of correlation among analysis data packets;
and enabling the vector representation corresponding to the combined result to sequentially pass through the full connection layer and the softmax layer to obtain the feature vector describing the statistical features of the behavior chain.
Further, the behavior chain statistical features include vector dimensions, average, standard deviation, median absolute deviation, skewness, kurtosis, and equine distance.
Further, the baseline model is constructed based on normal application software communication traffic data, comprising:
constructing a plurality of normal software behavior chains based on normal application software communication flow data;
And generating feature vectors of the statistical features of the descriptive behavior chains of each normal software behavior chain, and then clustering to obtain a baseline model.
Further, the generating the feature vector describing the statistical feature of the behavior chain of each normal software behavior chain and then clustering to obtain a baseline model includes:
Treating each feature vector describing the statistical features of the behavior chain as an independent cluster;
Generating a plurality of clusters Vi(0) by calculating the similarity between the feature vectors, wherein i is a positive integer;
Generating a center vector of the cluster Vi(t-1) based on the variance of the similarity between each feature vector and other feature vectors in the cluster Vi(t-1), wherein t is an iteration round;
Generating a plurality of clusters Vi(t) according to the similarity between the center vector of each cluster Vi(t-1) and the center vectors of other clusters Vj(t-1), and returning t=t+1 to the variance based on the similarity between each feature vector and other feature vectors in the clusters Vi(t-1), wherein j is a positive integer and j is not equal to i;
And obtaining a baseline model until the number of clusters in the iteration round t and the iteration round t+1 is unchanged or the total number of clusters Vi(t) in the iteration round t is set to be k, wherein k is the kind number of basic network behaviors.
Further, the similarity between the feature vectorsWhere wr is a learnable weight matrix, cov (βij) is the covariance between eigenvector βi and eigenvector βj, σ (βi) is the standard deviation of eigenvector βi, and σ (βj) is the standard deviation of eigenvector βj.
Further, after obtaining the abnormal behavior detection result of the software behavior chain, the method further comprises the steps of:
Manually studying and judging abnormal behavior detection results of a software behavior chain;
And under the condition that misjudgment occurs to the abnormal behavior detection result of the software behavior chain, recalculating the similarity of misjudgment behaviors, and adjusting the learnable weight matrix wr.
An application software abnormal behavior detection system based on network communication similarity, the system comprising:
The data packet grouping module is used for grouping the data packets according to the data packet characteristics and constructing each group of data packets into a software behavior chain;
The vector generation module is used for generating feature vectors of each software behavior chain, which describe statistical features of the behavior chains;
The system comprises an anomaly detection module, a baseline model and a software behavior chain, wherein the anomaly detection module is used for obtaining an anomaly behavior detection result of the software behavior chain based on the similarity between a feature vector describing statistical features of the behavior chain and a center vector of each cluster in the baseline model, and the baseline model is constructed based on normal application software communication flow data.
Compared with the prior art, the method combines the deep learning model with the behavior analysis, and designs a method for analyzing the network communication behavior similarity by using the deep learning model. The method and the system capture long-distance dependency relationship between data by using a self-attention mechanism so as to more accurately represent the characteristics of a software behavior chain, extract baselines of similar behaviors based on a clustering algorithm to obtain a baseline model of the behaviors, process and analyze real-time traffic on the basis of the baseline model output behavior baselines, judge whether the application software communication behaviors are abnormal by comparing the similarity of the newly generated behaviors and normal behaviors in the model, and provide scientific data support for application software abnormal behavior detection.
Drawings
FIG. 1 is a flow chart of a method for detecting abnormal behavior of application software based on network communication similarity.
FIG. 2 is a schematic diagram of a self-attention acquisition behavioral chain context.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, embodiments of the present invention will be described in further detail below.
The method for detecting the abnormal behavior of the application software based on the network communication similarity, as shown in fig. 1, comprises the following steps.
And step 1, grouping the data packets according to the characteristics of the data packets, and constructing each group of data packets into a software behavior chain.
The present embodiment uses a 256-bit sliding window to receive network packets. When the sliding window reaches the upper limit, in order to acquire a series of network behaviors of the software, the data packets are grouped according to the characteristics of the data packets, and each group of the data packets is constructed into a software behavior chain. In a chain of actions, each data packet represents a node of the chain.
Specifically, the invention analyzes the collected data packets one by one, extracts the protocol field and the effective characteristic information, and classifies the data packets according to the protocol field and the effective characteristic information. Features used include, but are not limited to, protocol type (e.g., TCP, UDP, IP, DNS, etc.), departure time, arrival time, source IP, destination IP, source port, destination port, payload, software signature, packet length, TCP handshake metadata, total number of bytes, maximum/minimum/average/standard deviation of packet length, maximum/minimum/average/standard deviation of packet interval time, entropy of information, etc. Based on the extracted features, a match is made using predefined application signatures or rules to determine the specific application used in the data packet.
The specific grouping mode is divided into the following modes, and can be used in a matching way:
(1) Grouping into groups having identical software signatures;
(2) For a TCP connection, data packets from the same TCP connection are grouped into a group;
(3) The method comprises the steps of dividing a group of source ports, destination addresses and destination ports with the same protocol;
(4) The payloads are grouped into groups having the same specific file type or keyword.
And sorting the grouped data packets according to the departure time, normalizing and representing the characteristics of the data packets, generating a representation vector for each data packet, wherein a group of representation vectors are a behavior chain, and each data packet represents a node of the behavior chain.
And 2, generating feature vectors of each software behavior chain describing statistical features of the behavior chain.
The present invention uses a self-attention mechanism to calculate the context relationship between nodes in a chain of actions. Based on the output result of self-attention, the dimension is counted, and the statistical characteristics of average value, standard deviation and the like are calculated to obtain a feature vector describing the statistical characteristics of the behavior chain.
In one embodiment, for a behavior chain, each node of the behavior chain firstly self-attentively analyzes the correlation of the internal characteristics of the node through a self-attentive component to output a segment vector, then combines the segment vectors, inputs the segment vectors to another self-attentive layer after batch regularization (Batch Normalization), the self-attentive component analyzes the correlation among the nodes of the whole behavior chain, then enters a full-connection layer after batch regularization (Batch Normalization), the full-connection layer uses a ReLU as an activation function, and finally obtains a final output result through a softmax layer. The specific flow is shown in fig. 2.
The self-attention calculation formula is as follows:
Let the input vector x= (X1,x2,...,xn) and the output vector z= (Z1,z2,...,zn), then:
Where WQ,WK, wv is a learnable weight matrix and d is a feature dimension.
Thus, for the resulting output vector α= (s1,s2,...,sn) from the attention component, the eigenvector β= (y1,y2,...,yn) is calculated, the value of β having the meaning:
Beta= (vector dimension, average, standard deviation, median absolute deviation, skewness, kurtosis, march distance), where median absolute deviation may represent deviation of a concentration location of a behavioral characteristic of a behavioral chain from the average, skewness represents a bias of behavioral chain behavior, leaning left or right, kurtosis represents a peak of behavioral chain behavior, and mahalanobis distance represents similarity between a node and the average.
The calculation formula is as follows:
Vector dimension n
Average value μ= (x1+x2+...+xn)/n
Standard deviation:
median of m
Median absolute deviation: mad=mean (|xi -mean (α) |)
Degree of deviation:
kurtosis:
horse-type distance: Wherein S is a covariance matrix, let γ= (x1-μ,...,xn - μ),
And step 3, obtaining an abnormal behavior detection result of the software behavior chain based on the similarity between the feature vector describing the statistical features of the behavior chain and the central vector of each cluster in the baseline model.
The baseline model of the present invention is constructed based on normal application software traffic data. In the training set, the collected data are all behavior data of the normal network communication of the software, no abnormal data exist, and a baseline of the normal behavior of the software is obtained after training. For the data in the training set, invalid traffic, such as repetition, TCP retransmission, HTTP handshake data packets, etc., needs to be removed, and the repeated traffic is merged. The flow data is preprocessed and then used as subsequent input data.
And clustering the feature vectors by using a clustering algorithm after the data are completely processed. In the clustering process, the similarity between the feature vectors is used as a classification standard. The similarity distance between the feature vector and the center vector of each cluster is required to be recorded, and one of the maximum similarity distances is selected as a baseline standard for judging abnormal behavior. The trained model will become the baseline model for detecting abnormal behavior according to the invention.
In this embodiment, the feature vectors are classified according to a clustering algorithm to construct a baseline model. For the feature vector beta12,...,βm, a hierarchical clustering method is adopted, the number of clusters after classification is set as k, and the k value is determined by the number of basic network behavior types of software.
The method comprises the steps of firstly regarding each vector as an independent cluster, calculating similarity between each feature vector, combining the similarity more recently into the same cluster, then calculating a center vector of each cluster, calculating variance of similarity between each vector and other vectors in the cluster, taking the vector with the smallest variance as the center vector, randomly selecting one vector as the center vector of the cluster if the variances are equal, and finally calculating the similarity between the center vectors of each cluster and combining the clusters with higher similarity. This process is repeated until there is no change from the last classification or the number of clusters is k. After classification is completed, each class has a center vector, the similarity between the vectors in the cluster and the center vector is calculated, the lowest similarity is selected as a base line value, and the generated model is a base line model.
The formula for calculating the similarity of the feature vectors is as follows: Wherein wr is a learnable weight matrix, the similarity between vectors is adjusted, cov (betaij)=E[βiβj]-E[βi]E[βj) is covariance between betaij, and sigma (betai),σ(βj) is standard deviation of the vector betaij.
And during testing, calculating the similarity of each feature vector and the center vector of each cluster in the baseline model, and if the feature vector cannot be divided into any cluster, regarding the behavior chain as abnormal communication behavior of the software.
And 4, optimizing the baseline model through manual research and judgment.
For abnormal communication behaviors detected by the model, manual research and judgment are needed. If the misjudgment exists, the similarity of the misjudgment behavior is recalculated, and the learnable weight matrix wr for calculating the similarity in the step is adjusted to optimize the baseline model
To sum up, in order to obtain a series of network behaviors of the software, the invention groups the collected network data packets according to the characteristics of the data packets, and distinguishes the data packets sent by each software and the different behaviors represented by the data packets. The feature types used include, but are not limited to, statistical features, protocol-based features, time sequence features, manually labeled features, deep learning model extracted features, and the like, and effective features are extracted by combining the features and feature extraction methods. Classifying the data packets according to the characteristics based on predefined rules, and finally normalizing the characteristics.
In order to acquire a series of network behaviors of software, the invention groups the collected network data packets according to the characteristics of the data packets, and distinguishes the data packets sent by each software and different behaviors represented by the data packets. The feature types used include, but are not limited to, statistical features, protocol-based features, time sequence features, manually labeled features, deep learning model extracted features, and the like, and effective features are extracted by combining the features and feature extraction methods. Classifying the data packets according to the characteristics based on predefined rules, and finally normalizing the characteristics.
The invention obtains the context relation of network communication behaviors based on self-attention. Self-attention is an attention mechanism for processing sequence data that allows a model to assign different attention weights to different locations in a sequence as it is processed, more flexibly capturing dependencies in a sequence. Therefore, the invention uses the self-attention component to process the software communication behavior, firstly obtains the relation inside a single node in the behavior to obtain the segment vector of the single node, and then combines the segment vectors to obtain the context relation of the whole behavior chain.
The statistical features of the invention can be used for knowing the behavior trend and the discrete degree of the software behavior chain, and for the vectors output by self-attention, the statistical features of the dimension, the calculated average value, the standard deviation, the variance, the deviation, the median absolute deviation, the kurtosis and the like are counted to obtain the feature vector for representing the statistical features of the communication behavior.
The specific category number of the present invention is determined according to the category of the software behavior. The basic behaviors of different software are similar, so that the basic behaviors from different software can be classified into the same class through clustering, the special behaviors of the software can be separately classified into one class, and after classification, the similarity baseline of each class is recorded.
The above embodiments are only for illustrating the technical solution of the present invention and not for limiting the same, and those skilled in the art may modify or substitute the technical solution of the present invention, and the scope of the present invention is defined by the claims.

Claims (7)

Translated fromChinese
1.一种基于网络通信相似度的应用软件异常行为检测方法,其特征在于,所述方法包括:1. A method for detecting abnormal behavior of application software based on network communication similarity, characterized in that the method comprises:根据数据包特征进行数据包分组,并将每组数据包构建为一条软件行为链;Group data packets according to their characteristics, and construct each group of data packets into a software behavior chain;生成每一软件行为链的描述行为链统计特征的特征向量;其中,所述生成每一软件行为链的描述行为链统计特征的特征向量,包括:Generate a feature vector describing the statistical characteristics of the behavior chain for each software behavior chain; wherein, generating a feature vector describing the statistical characteristics of the behavior chain for each software behavior chain includes:获取软件行为链中所有数据包的向量表示xiGet the vector representation xi of all data packets in the software behavior chain;通过分析数据包特征相关性的自注意力计算,得到向量表示xi所对应的向量表示siBy analyzing the self-attention calculation of the correlation of the data packet features, the vector representation si corresponding to the vector representationxi is obtained;组合该软件行为链所对应的向量表示si,并将组合结果通过批量正则化后,通过分析数据包之间相关性的自注意力计算,得到该组合结果所对应的向量表示;The vector representation si corresponding to the software behavior chain is combined, and the combined result is subjected to batch normalization, and then the vector representation corresponding to the combined result is obtained by analyzing the self-attention calculation of the correlation between the data packets;将该组合结果所对应的向量表示依次通过全连接层和softmax层,得到描述行为链统计特征的特征向量;The vector representation corresponding to the combined result is passed through the fully connected layer and the softmax layer in sequence to obtain a feature vector describing the statistical characteristics of the behavior chain;基于该描述行为链统计特征的特征向量与基线模型中的每个簇的中心向量的相似度,得到该软件行为链的异常行为检测结果;其中,所述基线模型基于正常的应用软件通信流量数据构建,所述基线模型的构建过程包括:Based on the similarity between the feature vector describing the statistical characteristics of the behavior chain and the central vector of each cluster in the baseline model, the abnormal behavior detection result of the software behavior chain is obtained; wherein the baseline model is constructed based on normal application software communication traffic data, and the construction process of the baseline model includes:基于正常的应用软件通信流量数据,构建若干正常的软件行为链;Based on normal application software communication traffic data, build several normal software behavior chains;生成每一正常的软件行为链的描述行为链统计特征的特征向量后进行聚类处理,得到基线模型;其中,所述生成每一正常的软件行为链的描述行为链统计特征的特征向量后进行聚类处理,得到基线模型,包括:Generate a feature vector describing the statistical characteristics of each normal software behavior chain, and then perform clustering processing to obtain a baseline model; wherein the generating a feature vector describing the statistical characteristics of each normal software behavior chain, and then perform clustering processing to obtain a baseline model, includes:将每一描述行为链统计特征的特征向量视为独立的簇;Each feature vector describing the statistical characteristics of the behavior chain is considered as an independent cluster;通过计算特征向量之间的相似度,生成若干簇Vi(0),i为正整数;By calculating the similarity between feature vectors, several clusters Vi(0) are generated, where i is a positive integer;基于簇Vi(t-1)内每个特征向量与其他特征向量的相似度的方差,生成该簇Vi(t-1)的中心向量;其中,t为迭代轮次;Based on the variance of the similarity between each feature vector in cluster Vi(t-1) and other feature vectors, the center vector of cluster Vi(t-1) is generated; where t is the iteration round;根据每个簇Vi(t-1)的中心向量和其他簇Vj(t-1)的中心向量的相似度生成若干簇Vi(t)后,令t=t+1,并返回至所述基于簇Vi(t-1)内每个特征向量与其他特征向量的相似度的方差;其中,j为正整数且j≠i;After generating several clusters Vi(t) according to the similarity between the central vector of each cluster V i (t-1)and the central vector of other clusters Vj(t-1) , let t=t+1, and return to the variance based on the similarity between each feature vector in cluster Vi(t-1) and other feature vectors; wherein j is a positive integer and j≠i;直到迭代轮次t和迭代轮次t+1中簇的数量未发生改变,或者迭代轮次t中簇Vi(t)的总数为设定数量k,得到基线模型;其中,k为基本网络行为的种类数量。The baseline model is obtained until the number of clusters in iteration round t and iteration round t+1 does not change, or the total number of clustersVi(t) in iteration round t reaches the set number k, where k is the number of types of basic network behaviors.2.根据权利要求1所述的方法,其特征在于,所述根据数据包特征进行数据包分组,并将每组数据包构建为一条软件行为链,包括:2. The method according to claim 1, characterized in that the step of grouping data packets according to data packet characteristics and constructing each group of data packets into a software behavior chain comprises:使用滑动窗口接收网络流量的数据包;Receive packets of network traffic using a sliding window;提取每一数据包的数据包特征,所述数据包特征包括:协议类型、出发时间、到达时间、源地址、目的地址、源端口、目的端口、有效载荷、软件签名、数据包长度、TCP握手元数据、字节总数、包长的最大值/最小值/平均值/标准差、包间隔时间的最大值/最小值/平均值/标准差和信息熵;Extracting data packet features of each data packet, the data packet features including: protocol type, departure time, arrival time, source address, destination address, source port, destination port, payload, software signature, data packet length, TCP handshake metadata, total number of bytes, maximum/minimum/average/standard deviation of packet length, maximum/minimum/average/standard deviation of packet interval time, and information entropy;根据软件签名、协议类型、源地址、源端口、目的地址、目的端口以及有效载荷进行数据包分组;Group data packets based on software signature, protocol type, source address, source port, destination address, destination port, and payload;将分完组的数据包按出发时间排序,得到对应的软件行为链。Sort the grouped data packets by departure time to obtain the corresponding software behavior chain.3.根据权利要求2所述的方法,其特征在于,所述根据软件签名、协议类型、源地址、源端口、目的地址、目的端口以及有效载荷进行数据包分组,包括:3. The method according to claim 2, characterized in that the data packet grouping according to software signature, protocol type, source address, source port, destination address, destination port and payload comprises:获取数据包分组策略,所述数据包分组策略包括:Obtain a data packet grouping strategy, the data packet grouping strategy comprising:具有相同软件签名的数据包分为一组;Data packets with the same software signature are grouped together;和,and,对于协议类型为TCP的数据包,将来自同一个TCP连接的数据包分为一组;For packets with protocol type TCP, packets from the same TCP connection are grouped together;和,and,具有相同协议类型、相同源地址、相同源端口、相同目的地址、相同目的端口的数据包分为一组;Data packets with the same protocol type, source address, source port, destination address, and destination port are grouped together;和,and,将有效载荷中具有相同的特定文件类型或关键词的数据包分为一组;Grouping packets with the same specific file type or keyword in the payload;在至少两个数据包的数据包特征满足所述分组策略的情况下,将该些数据包分为一组。In the case that the data packet characteristics of at least two data packets satisfy the grouping strategy, the data packets are grouped together.4.根据权利要求1所述的方法,其特征在于,所述行为链统计特征包括:向量维度、平均值、标准差、中位数、中位数绝对偏差、偏度、峰度和马式距离。4. The method according to claim 1 is characterized in that the behavior chain statistical features include: vector dimension, mean, standard deviation, median, median absolute deviation, skewness, kurtosis and Mahalanobis distance.5.根据权利要求1所述的方法,其特征在于,所述特征向量之间的相似度其中,wr为可学习的权重矩阵,Cov(βij)为特征向量βi与特征向量βj之间的协方差,σ(βi)是特征向量βi标准差,σ(βj)是特征向量βj的标准差。5. The method according to claim 1, characterized in that the similarity between the feature vectors Where wr is a learnable weight matrix, Cov(βij ) is the covariance between the eigenvector βi and the eigenvector βj , σ(βi ) is the standard deviation of the eigenvector βi , and σ(βj ) is the standard deviation of the eigenvector βj .6.根据权利要求5所述的方法,其特征在于,所述基于该描述行为链统计特征的特征向量与基线模型中的每个簇的中心向量的相似度,得到该软件行为链的异常行为检测结果之后,还包括:6. The method according to claim 5, characterized in that after obtaining the abnormal behavior detection result of the software behavior chain based on the similarity between the feature vector describing the statistical characteristics of the behavior chain and the central vector of each cluster in the baseline model, it also includes:对软件行为链的异常行为检测结果进行人工研判;Manually analyze and judge the abnormal behavior detection results of the software behavior chain;在所述软件行为链的异常行为检测结果出现误判的情况下,则重新计算误判行为的相似度,调整可学习的权重矩阵wrIn the case that the abnormal behavior detection result of the software behavior chain is misjudged, the similarity of the misjudged behavior is recalculated and the learnable weight matrix wr is adjusted.7.一种基于网络通信相似度的应用软件异常行为检测系统,其特征在于,所述系统包括:7. A system for detecting abnormal behavior of application software based on network communication similarity, characterized in that the system comprises:数据包分组模块,用于根据数据包特征进行数据包分组,并将每组数据包构建为一条软件行为链;A data packet grouping module is used to group data packets according to data packet characteristics and construct each group of data packets into a software behavior chain;向量生成模块,用于生成每一软件行为链的描述行为链统计特征的特征向量,其中,所述生成每一软件行为链的描述行为链统计特征的特征向量,包括:A vector generation module is used to generate a feature vector describing the statistical characteristics of the behavior chain of each software behavior chain, wherein the generating of the feature vector describing the statistical characteristics of the behavior chain of each software behavior chain includes:获取软件行为链中所有数据包的向量表示xiGet the vector representation xi of all data packets in the software behavior chain;通过分析数据包特征相关性的自注意力计算,得到向量表示xi所对应的向量表示siBy analyzing the self-attention calculation of the correlation of the data packet features, the vector representation si corresponding to the vector representationxi is obtained;组合该软件行为链所对应的向量表示si,并将组合结果通过批量正则化后,通过分析数据包之间相关性的自注意力计算,得到该组合结果所对应的向量表示;The vector representation si corresponding to the software behavior chain is combined, and the combined result is subjected to batch normalization, and then the vector representation corresponding to the combined result is obtained by analyzing the self-attention calculation of the correlation between the data packets;将该组合结果所对应的向量表示依次通过全连接层和softmax层,得到描述行为链统计特征的特征向量;The vector representation corresponding to the combined result is passed through the fully connected layer and the softmax layer in sequence to obtain a feature vector describing the statistical characteristics of the behavior chain;异常检测模块,用于基于该描述行为链统计特征的特征向量与基线模型中的每个簇的中心向量的相似度,得到该软件行为链的异常行为检测结果;其中,所述基线模型基于正常的应用软件通信流量数据构建,所述基线模型的构建过程包括:The anomaly detection module is used to obtain the abnormal behavior detection result of the software behavior chain based on the similarity between the feature vector describing the statistical characteristics of the behavior chain and the central vector of each cluster in the baseline model; wherein the baseline model is constructed based on normal application software communication traffic data, and the construction process of the baseline model includes:基于正常的应用软件通信流量数据,构建若干正常的软件行为链;Based on normal application software communication traffic data, build several normal software behavior chains;生成每一正常的软件行为链的描述行为链统计特征的特征向量后进行聚类处理,得到基线模型;其中,所述生成每一正常的软件行为链的描述行为链统计特征的特征向量后进行聚类处理,得到基线模型,包括:Generate a feature vector describing the statistical characteristics of each normal software behavior chain, and then perform clustering processing to obtain a baseline model; wherein the generating a feature vector describing the statistical characteristics of each normal software behavior chain, and then perform clustering processing to obtain a baseline model, includes:将每一描述行为链统计特征的特征向量视为独立的簇;Each feature vector describing the statistical characteristics of the behavior chain is considered as an independent cluster;通过计算特征向量之间的相似度,生成若干簇Vi(0),i为正整数;By calculating the similarity between feature vectors, several clusters Vi(0) are generated, where i is a positive integer;基于簇Vi(t-1)内每个特征向量与其他特征向量的相似度的方差,生成该簇Vi(t-1)的中心向量;其中,t为迭代轮次;Based on the variance of the similarity between each feature vector in cluster Vi(t-1) and other feature vectors, the center vector of cluster Vi(t-1) is generated; where t is the iteration round;根据每个簇Vi(t-1)的中心向量和其他簇Vj(t-1)的中心向量的相似度生成若干簇Vi(t)后,令t=t+1,并返回至所述基于簇Vi(t-1)内每个特征向量与其他特征向量的相似度的方差;其中,j为正整数且j≠i;After generating several clusters Vi(t) according to the similarity between the central vector of each cluster V i (t-1)and the central vector of other clusters Vj(t-1) , let t=t+1, and return to the variance based on the similarity between each feature vector in cluster Vi(t-1) and other feature vectors; wherein j is a positive integer and j≠i;直到迭代轮次t和迭代轮次t+1中簇的数量未发生改变,或者迭代轮次t中簇Vi(t)的总数为设定数量k,得到基线模型;其中,k为基本网络行为的种类数量。The baseline model is obtained until the number of clusters in iteration round t and iteration round t+1 does not change, or the total number of clustersVi(t) in iteration round t reaches the set number k, where k is the number of types of basic network behaviors.
CN202410698083.XA2024-05-312024-05-31 Method and system for detecting abnormal behavior of application software based on network communication similarityActiveCN118713865B (en)

Priority Applications (1)

Application NumberPriority DateFiling DateTitle
CN202410698083.XACN118713865B (en)2024-05-312024-05-31 Method and system for detecting abnormal behavior of application software based on network communication similarity

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
CN202410698083.XACN118713865B (en)2024-05-312024-05-31 Method and system for detecting abnormal behavior of application software based on network communication similarity

Publications (2)

Publication NumberPublication Date
CN118713865A CN118713865A (en)2024-09-27
CN118713865Btrue CN118713865B (en)2025-03-04

Family

ID=92812204

Family Applications (1)

Application NumberTitlePriority DateFiling Date
CN202410698083.XAActiveCN118713865B (en)2024-05-312024-05-31 Method and system for detecting abnormal behavior of application software based on network communication similarity

Country Status (1)

CountryLink
CN (1)CN118713865B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN119539217B (en)*2025-01-232025-04-25深圳市南方国讯科技有限公司 A logistics scheduling management system based on big data

Citations (2)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US7023979B1 (en)*2002-03-072006-04-04Wai WuTelephony control system with intelligent call routing
CN104901971A (en)*2015-06-232015-09-09北京东方棱镜科技有限公司Method and device for carrying out safety analysis on network behaviors

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
KR102028093B1 (en)*2017-10-252019-10-02한국전자통신연구원Method of detecting abnormal behavior on the network and apparatus using the same
CN108270620B (en)*2018-01-152020-07-31深圳市联软科技股份有限公司Network anomaly detection method, device, equipment and medium based on portrait technology
CN109951462B (en)*2019-03-072020-08-25中国科学院信息工程研究所Application software flow anomaly detection system and method based on holographic modeling

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US7023979B1 (en)*2002-03-072006-04-04Wai WuTelephony control system with intelligent call routing
CN104901971A (en)*2015-06-232015-09-09北京东方棱镜科技有限公司Method and device for carrying out safety analysis on network behaviors

Also Published As

Publication numberPublication date
CN118713865A (en)2024-09-27

Similar Documents

PublicationPublication DateTitle
CN112953924B (en)Network abnormal flow detection method, system, storage medium, terminal and application
CN111915437B (en)Training method, device, equipment and medium of money backwashing model based on RNN
CN110391958B (en)Method for automatically extracting and identifying characteristics of network encrypted flow
CN112381121A (en)Unknown class network flow detection and identification method based on twin network
CN114615093A (en) Anonymous network traffic identification method and device based on traffic reconstruction and inheritance learning
Eljialy et al.Novel framework for an intrusion detection system using multiple feature selection methods based on deep learning
CN113723440B (en)Encryption TLS application flow classification method and system on cloud platform
CN110225030A (en)Malice domain name detection method and system based on RCNN-SPP network
CN109951462B (en)Application software flow anomaly detection system and method based on holographic modeling
CN109639734B (en)Abnormal flow detection method with computing resource adaptivity
Peraković et al.Artificial neuron network implementation in detection and classification of DDoS traffic
CN118713865B (en) Method and system for detecting abnormal behavior of application software based on network communication similarity
CN112134862B (en)Coarse-fine granularity hybrid network anomaly detection method and device based on machine learning
CN111641598A (en)Intrusion detection method based on width learning
CN117614742B (en)Malicious traffic detection method with enhanced honey point perception
CN117579324B (en)Intrusion detection method based on gating time convolution network and graph
CN114553790A (en) A small sample learning method and system for IoT traffic classification based on multimodal features
CN118051818A (en)Internet of things equipment identification method based on federal learning and behavior analysis
Shao et al.Deep learning hierarchical representation from heterogeneous flow-level communication data
CN117892125A (en)Multi-class unbalanced network traffic data enhancement method based on improved generation of countermeasure network
Sharipuddin et al.Intrusion detection with deep learning on internet of things heterogeneous network
CN116647374A (en)Network flow intrusion detection method based on big data
Yang et al.IoT botnet detection with feature reconstruction and interval optimization
CN120017304A (en) A method for detecting encrypted malicious traffic based on deep learning
CN119172143A (en) A method, system, device and medium for classifying and identifying malicious traffic based on graph convolutional neural network

Legal Events

DateCodeTitleDescription
PB01Publication
PB01Publication
SE01Entry into force of request for substantive examination
SE01Entry into force of request for substantive examination
GR01Patent grant
GR01Patent grant

[8]ページ先頭

©2009-2025 Movatter.jp