Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Referring to fig. 1, the invention provides a private cloud security management method based on multidimensional feature data analysis, which comprises the steps of S1, multidimensional feature data acquisition, namely, acquiring multidimensional feature data in a private cloud environment in real time, wherein the multidimensional feature data comprises user behavior logs, system use records and resource access records.
The multidimensional feature data is a core foundation of private cloud security management, and mainly comprises four key data, namely 1, a user behavior log record, wherein the operation track of a user in the private cloud, such as login, file access, permission change and the like, is used for identifying abnormal operation behaviors (such as override access and high-frequency sensitive operation), and is an important basis for constructing a permission abuse index.
2. And the network traffic data comprises network communication data of the inside and boundary of the private cloud, including source/target IP, port, protocol and the like, and can analyze the transverse penetration path and abnormal traffic characteristics (such as malicious code propagation) so as to provide support for the calculation of the transverse penetration risk index.
3. The system uses the record that the real-time utilization rate data of resources such as CPU, memory, storage and the like are collected, and the abnormal rate of the system resources is generated by comparing the data with the historical average value, so as to detect the resource exhaustion type attack (such as DDoS) or abnormal configuration.
4. And the resource access record records the access frequency and path of each service partition to the resources such as storage, database and the like, and is used for constructing an access relation map among the service partitions and supporting path analysis of the transverse penetration risk indexes.
And S2, establishing service partition isolation, namely carrying out micro-segment division on the private cloud based on service logic to generate a plurality of independent service partitions.
The specific mode of partitioning independent business adopts a refined network segmentation strategy according to the functional characteristics, the security requirements and the data sensitivity of different business systems, concentrates the resources of the same business or related businesses in a specific partition, and limits the network flow crossing the partition through an access control strategy, thereby realizing the security isolation among different businesses and effectively reducing the lateral permeation risk.
It should be noted that by dividing the private cloud into a plurality of independent service partitions, the following objectives of 1, risk isolation are achieved, namely, limiting the lateral diffusion of security threats among different service systems.
2. Resource management and control, namely distributing independent computing, storage and network resources according to service requirements.
3. Compliance supports meeting industry data isolation requirements (e.g., GDPR, HIPAA).
And S3, internal network security analysis, namely constructing a transverse permeation risk index and a permission abuse index according to the user behavior log and the resource access record of each independent service partition, and evaluating the internal network security abnormality condition of the private cloud.
In a preferred embodiment of the present invention, the specific analysis method of the lateral penetration risk index includes extracting resource access records from multidimensional feature data in a private cloud environment, obtaining numbers of other independent service partitions corresponding to each resource access record in each independent service area, counting access times from each independent service partition to each other independent service partition, and recording asWherein、Indicating the number of the independent service partition,,,Indicating the number of independent traffic partitions,。
Acquiring the shortest path length from each independent service partition to each other independent service partition based on breadth-first algorithm, and recording as。
It should be noted that the breadth-first algorithm is a graph search algorithm that extends the search outwardly layer by layer, starting from the starting node. During the search, it will preferentially access nodes closer to the starting node until the target node is found or all nodes are traversed. In the network architecture of the private cloud, each individual traffic partition is considered a node in the graph, and the connections between the partitions are considered edges. From the initial nodeInitially, it is marked as accessed and placed in a queue.
When the queue is not empty, the node at the head of the queue is fetched, and it is checked whether it is the target node. If so, then find the slaveTo the point ofThe search is ended.
If not, all non-visited neighboring nodes for that node are marked as visited and added to the queue. These are other independent traffic partitions directly connected to the current node.
As the search proceeds, it is progressively traversed to nodes further and further from the starting node. Every time a new node is traversed, its distance (path length) to the starting node is recorded.
When the target node is foundIn this case, the slave start node can be obtained from the recorded distance informationTo the target nodeIs provided for the shortest path length of (a).
Using the formulaAnalysis to obtain lateral permeation risk indexWhereinRepresenting the preset firstAsset value of individual business partitions predefined by the business system according to asset importance.
It should be noted that, the above formula is constructed by the following logic 1, molecular part: Representing slave independent traffic partitionsTo the partitionThe frequency of business interaction between two different partitions is reflected.Is the firstThe asset value of each independent business partition is predefined by the business system according to the asset importance, and reflects the importance degree of the assets of the target partition. Multiplication of the two means that the more frequent the access and the higher the target zone asset value, the greater the contribution to the risk index.
2. Denominator part: Are obtained from independent service partitions based on breadth-first algorithmTo the partitionIs provided for the shortest path length of (a). The longer the path length, the greater the difficulty of penetrating from the source partition to the target partition, and the lower the risk.
3. Dual summation portion by partitioning all independent trafficAnd(,From 1 to,From 1 to) And the comprehensive risk values based on access conditions, asset values and penetration difficulties among different service partitions in the whole private cloud environment can be obtained by double summation. Because the relation between all the partitions is considered in the calculation process, the omission of potential risk paths is avoided.
4. Scaling part by dividingScaling to obtain lateral permeation risk indexAnd in a reasonable range, the risk values in the private cloud environments of different scales are convenient to compare. Because ofShown inThe total number of the two-by-two partition combinations (not including the combination from itself to itself) in each partition can be used for normalizing the comprehensive risk value.
In a possible embodiment, data simulation is performed based on the above formula, and assuming that the number of independent service partitions is 3, the corresponding simulation results are shown in table 1.
Table 1 partial data simulation results based on the lateral permeation risk index analysis formula
For the above simulation data results, the lateral permeation risk index LPRV =210 (no units, since it is a comprehensive risk index calculated by various relative relationships) comprehensively reflects the lateral permeation risk level between the 3 independent traffic partitions (excluding self-to-self cases). The larger the number, the higher the risk of lateral penetration representing the whole. In this simulation, 210 represents a degree of lateral penetration risk between different traffic partitions in the current private cloud environment.
In a preferred embodiment of the present invention, the specific analysis mode of the authority abuse index is as follows, extracting user behavior logs from multidimensional feature data in a private cloud environment, classifying based on marked users to obtain each user behavior log of each user, identifying whether each user behavior log has an abnormal operation mark, marking the user behavior log marked as the abnormal operation as an abnormal operation record of a corresponding user, and extracting operation time and an abnormal grade corresponding to the mark.
It should be explained that the anomaly level may be low risk, medium risk and high risk.
Extracting abnormal operation records corresponding to each user in a preset window time, counting to obtain the number of the abnormal operation records of each user, and recording the number asWhereinThe number of the user is indicated and,,Representing the number of users.
Assigning the abnormal level corresponding to each abnormal operation record based on the preset abnormal level-abnormal operation risk index corresponding relation to obtain the abnormal operation risk index corresponding to each abnormal operation record of each user, and recording asWhereinThe number indicating the record of the monitoring abnormal operation,,Indicating the number of monitoring abnormal operation records.
In a preferred embodiment, the abnormality level-abnormality operation risk index correspondence is that the low risk corresponds to an abnormality operation risk index of 2, the medium risk corresponds to an abnormality operation risk index of 3, and the high risk corresponds to an abnormality operation risk index of 4.
Obtaining the authority change time of each user, calculating the difference value between the current time and the authority change time of each user to obtain the authority change duration of each user, calculating the ratio of the authority change duration to the window time to obtain the authority change stability of each user, and recording as。
It should be noted that,Reflecting the stability of the user's rights,The larger the authority is, the longer the authority is unchanged, the stability is high (such as the authority of a system administrator is fixed),Smaller indicates that the rights are frequently adjusted recently, and the potential risk is high (such as temporary rights put down scenes).
It should be explained that the permission change refers to the adjustment behavior of the permission level, the access range or the operation capability of the user in the private cloud security management. Including but not limited to the addition and deletion of user roles (e.g., upgrade from a regular user to an administrator), adjustment of resource access rights (e.g., newly added file read/write rights), authorization/revocation of sensitive operations (e.g., database deletion rights), modification of rights expiration dates (e.g., issuance and reclamation of temporary rights).
Using the formulaAnalysis to obtain rights abuse indexWhereinRepresent the firstUser permission level coefficients for the individual users, which are predefined by the system administrator based on the preset user permission level.
It should be noted that, the above formula construction logic comprises 1, core calculation part: Is a comprehensive consideration of data related to abnormal operation of a user. Wherein, theRepresenting a userThe number of abnormal operation records in the preset window time reflects the frequency of abnormal operation of a user; Is the userIs the first of (2)The abnormal operation risk indexes corresponding to the abnormal operation monitoring records are assigned based on the preset abnormal level-abnormal operation risk index corresponding relation, and the risk degree of each abnormal operation is measured; Is the userThe user authority level coefficient of the system manager is predefined based on the preset user authority level, so that the user authority is reflected, and the higher the authority is, the greater the damage possibly caused by abuse is; Representing a userThe permission change stability of the window is obtained by calculating the ratio of the permission change duration to the corresponding duration of the window time according to the difference value of the current time and the permission change time, the lower the stability, the more frequent the rights change, the higher the risk of rights abuse that may exist.Comprehensively considering the influence of abnormal operation frequency, risk degree, user authority and authority change stability on authority abuse risk, and then carrying out average calculation on all monitoring abnormal operation records (from b=1 to B) to obtain the userAverage rights abuse risk value in the current monitoring case.
2. The max function is used in the formula because in the case of multiple users (or multiple calculations of the same user under different monitoring windows), the maximum average rights abuse risk value is chosen as the rights abuse index. This is to highlight the most risky situation, and once there is a significant risk of user rights abuse, it needs to be focused and handled in order to discover and prevent serious harm to the system caused by potential rights abuse.
In a possible embodiment, data simulation is performed based on the above formula, and the corresponding simulation results are shown in table 2.
Table 2 partial data simulation results based on rights abuse index analysis formula
In the simulation process, for the convenience of calculation, the abnormal operation risk indexes corresponding to the monitoring abnormal operation records of each user are set to be consistent.
For the above simulation data results, the rights abuse index pai=20 (no units, since it is a risk indicator calculated from a combination of multiple relative factors) represents that the risk of rights abuse of a user is at a relatively high level in the simulated group of users. In practical applications, the value can be used as a reference basis for judging whether the risk of authority abuse and the severity of the risk exist in the private cloud system.
In a preferred embodiment of the invention, the specific analysis mode for evaluating the internal network security abnormality of the private cloud is as follows, wherein the specific analysis mode comprises the steps of extracting a transverse permeation risk index and a permission abuse index, and further adding the weights to obtain the internal network security abnormality index of the private cloud, wherein the internal network security abnormality index is used for evaluating the corresponding abnormality degree of the internal network security abnormality of the private cloud.
It should be noted that, the setting of the weights corresponding to the lateral permeation risk index and the authority abuse index is based on the main business characteristics, security emphasis and history. If the interaction of each subarea in the service is frequent, the influence of the lateral permeation risk on the service is large, the index weight of the lateral permeation risk is high, and if the service has high requirement on authority management, the index weight of the authority abuse is higher. If the enterprises pay more attention to preventing the transverse attack, the transverse penetration risk index weight is increased, otherwise, if the rights abuse is concerned, the rights abuse index weight is increased. Also referring to the historical data, which type of anomaly causes more problems, the corresponding index weight is high.
Illustratively, the lateral permeation risk index and the entitlement abuse index correspond to weights of。
S4, global network security analysis, namely constructing a system resource abnormality rate and a log event entropy value according to the user behavior log and the system usage record, and evaluating global network security abnormality of the private cloud.
In a preferred embodiment of the present invention, the specific analysis method of the abnormal rate of the system resources is as follows, extracting a usage record of the user behavior log system from the multidimensional feature data in the private cloud environment, obtaining the real-time utilization rate of each system resource within the preset window time, and simultaneously obtaining the historical average utilization rate of each system resource corresponding to the usage record of the system.
Using the formulaAnalyzing to obtain abnormal rate of system resourceWhereinRepresent the firstThe real-time utilization of individual system resources,Represent the firstThe historical average utilization of the individual system resources,The number representing the system resource is indicated,,Representing the number of system resources.
It should be noted that the above formula has a construction logic of 1 and a difference adjustment value is calculated by first calculatingContrast real-time utilizationAnd historical average utilization. When the real-time utilization is higher than the historical average utilization, the larger the difference,The larger the value, otherwise, if the real-time utilization rate is not higher than the historical average utilization rate, the value is not positive.
2. Screening for effective differences byOnly the difference value with a utilization higher than 0 is retained.
3. Calculate the integrated offset value for all system resources (slaveTo the point of) A kind of electronic deviceAnd summing to obtain a comprehensive deviation value with the utilization rate of all resources higher than the historical average level.
4. Obtaining abnormal rate by dividing the comprehensive deviation value by the total number of system resourcesObtaining the abnormal rate of the system resourceReflecting the abnormality degree of the whole system resource.
5. The practical significance is that the abnormality degree is quantized, namely the abnormality rate of system resourcesAnd providing quantization indexes for private cloud system management. The higher the value, the more system resources the real-time utilization rate of which is higher than the historical average level, and the more serious the abnormal use of the resources.
Auxiliary operation and maintenance decision-making whenAnd when the method exceeds the normal range, an administrator can check the problems according to the method, such as determining whether resource overload, abnormal tasks and the like exist, and further taking measures such as optimizing resource allocation, adjusting task scheduling and the like, so that the stable operation of the system is ensured.
In the private cloud environment, system resources such as a server CPU, a memory, a hard disk storage space and a network bandwidth in a private cloud of an enterprise are used for calculating the abnormal rate of the system resources according to the real-time use condition and the historical average use condition, so that the stable operation of the private cloud is ensured.
In a preferred embodiment of the present invention, the specific analysis method of the log event entropy value includes extracting multidimensional feature data in a private cloud environment, obtaining the corresponding number of each log event in a preset window time based on a user behavior log, further performing duty ratio calculation to obtain the duty ratio of each log event, and recording asWhereinA number representing a log event is provided,,Representing the number of log events.
It should be explained that the log event refers to operation behavior data recorded in a structured or semi-structured form, and includes elements such as a timestamp, a user identifier, an operation type, a resource object, and a result state. Including but not limited to user log-in and log-out, file read-write operations, database query modifications, rights change records, and system error warning information.
Using the formulaAnalyzing to obtain the entropy value of the log event。
The analysis formula is constructed by a logic 1,Meaning of (2): Representing the duty cycle of the p-th log event within a preset window time. In a private cloud environment, the frequencies of various log events are different, and the relative frequency of occurrence of each log event can be clearly known by calculating the ratio of each log event in the total log event number. For example, 100 log events are generated in a specified time period, wherein 20 events are accessed by the resource, then the events are accessed by the resourceThat is 20/100=0.2,The distribution status of different log events in the whole is reflected.
2、Is a logarithmic functionFor measuringUncertainty of the represented log event. When (when)Near 0, this means that the log event is rarely present in its entirety, its uncertainty is high,The absolute value of (2) is larger whenNear 1, this log event is dominant in the whole, occurs very frequently, has low uncertainty,The absolute value is also large, and when qp=0.5,Is the smallest absolute value. This shows that when the log event duty cycle of a certain class is maximized, the uncertainty is low, and the more balanced the log event duty cycle of the various classes, the higher the uncertainty.
3. Overall summation and sign meaning for all slavesTo the point of(The number of categories of log events)Summing, and taking negative value to obtain log event entropy value. Summing is to comprehensively consider the uncertainty of all different log events, and negating is to make the final entropy value accord with the conventional cognition-higher entropy values indicate more disordered systems. When (when)When the value of (2) is larger, it indicates that log event distribution is scattered, multiple complex activities may exist in the system, and there is a higher possibility of abnormalityAnd when the value of (2) is smaller, the log event distribution is concentrated, and the system operates relatively stably and orderly.
In a preferred embodiment of the invention, the specific analysis mode for evaluating the global network security abnormality of the private cloud is as follows, namely, extracting the system resource abnormality rate and the log event entropy value, and further carrying out summation calculation according to weights to obtain the global network security abnormality index of the private cloud, wherein the global network security abnormality index is used for evaluating the abnormality degree of the global network security abnormality of the private cloud.
The system resource abnormality rate and the log event entropy value correspond to weights, namely the service characteristics are the service with high dependence on system resource stability, the system resource abnormality rate is high in weight, and the log event entropy value is high in weight by focusing on the service of data interaction and operation record. And secondly, a security policy, namely if the enterprise pays more attention to resource security, increasing the abnormal rate weight of the system resource, and if the abnormal operation behavior is monitored in a focused manner, increasing the entropy weight of the log event. Thirdly, the historical data and the risk assessment are referred to, which kind of abnormality causes more safety problems, and the corresponding index weight is higher.
Exemplary, the system resource anomaly rate and the log event entropy value correspond to weights of。
The method and the system can comprehensively perceive the security situation, quickly locate the problem and trigger the treatment, timely block the threat, comprehensively guarantee the private cloud security and improve the overall security and reliability by comprehensively evaluating the internal network security abnormal situation and the global network security abnormal situation based on the multidimensional feature data in the private cloud environment.
S5, an abnormal condition response decision is made, namely, based on a preset security policy library, the security abnormal condition and the global security abnormal condition of each independent service partition are comprehensively analyzed, and corresponding-level treatment measures are triggered.
In a preferred embodiment of the present invention, the specific analysis method for comprehensively analyzing the security anomaly condition and the global security anomaly condition of each independent service partition includes extracting an internal network security anomaly index and a global network security anomaly index of the private cloud, and comparing the internal network security anomaly index and the global network security anomaly index with a preset internal network security anomaly index threshold and a preset global network security anomaly index threshold respectively.
It should be noted that, the setting of the internal network security anomaly index threshold depends on the internal security risk level that is mainly acceptable to the service. For example, financial enterprises with extremely stringent data security and rights management, the internal network security anomaly index threshold will be set low, once the index exceeds the threshold, indicating that internal rights abuse or lateral permeation risk may have a serious impact on the business.
It should be noted that, the setting basis of the global network security abnormality index threshold is the requirement of the service on the overall network security and stability operation. Taking an e-commerce enterprise as an example, in order to ensure that the service is not interrupted during a sales promotion, the global network security anomaly index threshold is set to be slightly higher, and a certain degree of system resource fluctuation and log anomaly are allowed.
And if the internal network security abnormality index and the global network security abnormality index are both smaller than or equal to the corresponding threshold values, judging that no security abnormality exists.
And if the internal network security abnormality index is greater than the internal network security abnormality index threshold or the global network security abnormality index is greater than the global network security abnormality index threshold, judging that the security abnormality exists.
And when judging that the safety abnormality occurs, further determining the safety abnormality direction and feeding back.
In a preferred embodiment of the present invention, the specific manner of further determining the security anomaly orientation is as follows, referring to fig. 2, if the internal network security anomaly index is greater than the internal network security anomaly index threshold value, then the specific security anomaly orientation is identified as the internal network security anomaly.
Referring to fig. 3, if the global network security exception index is greater than the global network security exception index threshold, then a specific security exception is identified as being directed to a global network security exception.
In a preferred embodiment of the present invention, when identifying that a specific security exception points to an internal network security exception, further performing exception independent service partition identification and exception user identification is required, which is specifically as follows: and extracting the lateral permeation risk indexes of the independent service partitions, sequencing from large to small, and marking the independent service partition corresponding to the maximum lateral permeation risk index as an abnormal independent service partition.
And extracting the authority abuse index of each user, and further selecting the user corresponding to the maximum authority abuse index as an abnormal user.
By comprehensively analyzing the safety abnormal condition and the global safety abnormal condition of each independent service partition, the invention further triggers the corresponding-level treatment measures, can accurately locate threat sources, quickly judge whether the abnormality comes from the inside or the global, and take targeted measures according to different abnormalities. The automatic quick response can be realized, the processing time is greatly shortened, and the service continuity is ensured.
The foregoing is merely illustrative and explanatory of the principles of this invention, as various modifications and additions may be made to the specific embodiments described, or similar arrangements may be substituted by those skilled in the art, without departing from the principles of this invention or beyond the scope of this invention as defined in the claims.