Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Example 1
Referring to fig. 1, a tracing and tracing method for asset risk is provided in an embodiment of the present invention.
In this embodiment, the steps S01-S06 are used to describe the process of the asset risk tracking and tracing method of the present application in detail.
And S01, acquiring network asset basic data, wherein the network asset basic data comprise an enterprise network topological graph and an enterprise asset vulnerability risk list.
More specifically, the drawing process and the application of the enterprise network topology graph are as follows:
And acquiring IP address, port number and service type information of the enterprise network equipment by adopting network scanning to obtain a comprehensive network asset scanning result. And drawing an enterprise network topological graph according to the scanning result, determining the distribution condition and the connection relation of the network assets, and providing basic data for subsequent asset mapping and risk identification. And classifying and sorting the acquired network equipment information in an automatic mode, and dividing according to the type of an operating system, the type of a manufacturer and the dimension of a network position to form a hierarchical classification system and a network asset inventory. Based on the network topology diagram, the network equipment is divided into layers according to the importance of the assets and the service relevance, and the boundaries of a core layer, a convergence layer, an access layer and the connection dependency relationship of the boundaries are defined. And identifying system loopholes and configuration defect security risks through deep analysis of the network assets, calculating quantified risk levels, and forming a network asset risk map. Asset mapping is adopted, asset attributes and association relations are automatically and synchronously updated through integration with an IT operation and maintenance management platform and a CMDB system, network asset attribute information is associated with a business system, and business dependence relations and data transmission paths are combed. Based on the asset mapping, network isolation and access control means are adopted to divide the security domain of the network asset according to the service importance, the data sensitivity and the compliance requirements, and isolation limitation among different security domains is realized. Through continuous network scanning and asset mapping, network asset information is dynamically updated, newly-added or changed equipment is discovered timely, safety risks of the newly-added or changed equipment are evaluated, corresponding reinforcement optimization is carried out, a dynamic visualized network asset management and safety operation system is constructed, and the overall network safety protection level is improved.
By means of deep analysis of network assets, security risks such as system loopholes and configuration defects are identified, the problems that network equipment lacks password complexity requirements or does not change default passwords are focused, quantitative risk levels are calculated according to a CVSS loophole scoring system from factors such as availability, influence range and hazard degree of risks, a risk map of the network assets is formed, and decision support is provided for subsequent security reinforcement and protection strategy formulation. The asset mapping technology is adopted, equipment configuration information is collected based on SNMP protocol, WMI interface and other modes, asset attributes and association relations are automatically and synchronously updated through integration with IT operation and maintenance management platform, CMDB and other systems, attribute information of network assets is associated with a business system, business application and data flow directions of all the assets are determined, dependence relations of the businesses and data transmission paths are combed, a comprehensive asset mapping view is formed, and business impact analysis and risk assessment are facilitated. in the process of scanning network assets, by means of Nmap and other tools, full port scanning can be performed on a designated network segment at the rate of 1000 data packets sent per second in the modes of TCPSYN scanning, TCP connection scanning, UDP scanning and the like, so that information such as IP addresses, open ports, service versions and the like of network equipment can be obtained. The scan result can be exported into XML, JSON and other formats, analyzed by a Python script, key fields are extracted, and classified and summarized according to the type of an operating system (such as Windows, linux, ciscoIOS and the like), the model of a manufacturer (such as Cisco, juniper, HP and the like) and the network position (such as an office area, a production area, a DMZ area and the like) to form a network asset list. Meanwhile, by utilizing visualization tools such as Gephi, cytoscape and the like, a network topology graph is drawn according to the connection relation among the assets, and the distribution and interconnection conditions of the network equipment are intuitively displayed. In the aspect of vulnerability risk identification, a vulnerability scanning tool such as Nessus, openVAS can be used for carrying out security inspection on network equipment to identify common vulnerabilities and configuration defects, such as default passwords, weak passwords, MS17-010, heartbleed and the like. The vulnerability scanning can update the detection rules regularly according to the CVE vulnerability database, the scanning result can refer to the CVSS scoring standard, comprehensive evaluation is carried out on the vulnerability scanning result from dimensions such as Attack Vector (AV), attack Complexity (AC), user Interaction (UI), authority requirement (PR), influence range (S) and the like, vulnerability risk levels of 0-10 points are obtained, and repair suggestions are provided. For example, for high risk vulnerabilities scored above 7 points, repair needs to be completed within 5 working days, and for medium risk vulnerabilities scored between 4-6 points, repair needs to be completed within 10 working days. In the asset mapping process, SNMPv2/v3 protocol can be used for periodically collecting performance index data such as system information, interface information, CPU and memory utilization rate of network equipment, and data collection and analysis are realized through a pysnmp library of Python. For Windows servers, the server configuration information may be obtained remotely through WMI interface using wmic commands. The collected data can be synchronously compared with an IT asset management system, a Configuration Management Database (CMDB) and the like, the utilization trend of resources such as CPU, memory and the like in the future week is predicted through a time sequence algorithm such as ARIMA, prophet and the like, abnormal data are found in time, and a resource utilization report is generated, so that capacity planning and optimization are facilitated. In the aspect of security domain division, the network asset security domain can be divided into a core domain, a production domain, a development test domain, an office domain, an internet domain and the like according to the service attribute and the data sensitivity of the network asset, security devices such as an Access Control List (ACL), an Intrusion Prevention System (IPS), a Web Application Firewall (WAF) and the like are deployed between the boundaries and the regions of each security domain through the thought of deep defense, and corresponding security policies are configured. If between the Internet and the intranet, the external access can be limited by deploying an Nginx reverse proxy server and adopting an IP white list, and only trusted users can be allowed to access internal applications through IP, and between the office network and the production network, the user identity authentication can be realized by adopting a VPN gateway and adopting double-factor authentication (2 FA), and the network isolation can be realized by VLAN division. meanwhile, a threat information platform can be used for collecting external IP credit data, and risk scoring is carried out on the access source IP, so that dynamic access control is realized, and the risk of network attack is reduced.
More specifically, the drawing procedure and the application of the enterprise asset vulnerability risk list are as follows:
Aiming at target equipment and a system discovered by network asset scanning, using Nessus and Nexpose vulnerability scanning tools to detect security vulnerabilities, identifying security vulnerabilities existing in the assets through vulnerability feature library matching and vulnerability verification means, quantitatively evaluating the harmfulness of the vulnerabilities according to a CVSS (visual vulnerability scoring) system to form an enterprise asset vulnerability risk list, and determining the risk level of the vulnerabilities.
As shown in fig. 2, by setting a scanning policy, designating a scanning range, a scanning mode and a scanning depth, vulnerability scanning configuration parameters of a target asset are obtained. And identifying system vulnerabilities, web vulnerabilities and weak password security vulnerabilities existing in the target asset from multiple dimensions of an operating system, application software, a database and network equipment by adopting a preset number of kinds of vulnerability identification, including vulnerability detection based on feature code matching, protocol analysis based on a state machine and patch comparison analysis based on reverse engineering. And acquiring the name, the number, the type and the harm attribute information of the security vulnerability according to the identification result of the security vulnerability. And comprehensively evaluating the identified security vulnerabilities according to a vulnerability scoring system CVSS common to the industry from attack vectors, attack complexity, authority requirements, user interactions and influence range dimensions to obtain CVSS scores of the vulnerabilities. And judging the risk level of the security vulnerability according to the CVSS score. And integrating the vulnerability scanning result and the risk assessment information to generate a vulnerability risk list of the enterprise asset. The vulnerability risk list comprises the number of vulnerabilities, the types of the vulnerabilities and the risk distribution conditions of each asset, and the vulnerabilities which need to be focused and repaired are determined by sequencing according to the risk grades.
Specifically, aiming at target equipment and a system discovered by network asset scanning, a Nessus, nexpose vulnerability scanning tool is adopted to comprehensively detect security vulnerabilities. By setting a scanning strategy, a scanning range, a scanning mode and a scanning depth are specified, so that vulnerability scanning of the target asset is realized. Common vulnerability scanning strategies include full-port scanning, system vulnerability detection, web application vulnerability detection and the like, and proper scanning modes (such as SYN half connection and TCP full connection) and detection intensities (such as lightweight, common and deep) are selected according to network environments and system types. In the vulnerability scanning process, the scanning tool interacts with the target system and matches information according to the built-in vulnerability feature library, and analyzes response information by sending a detection data packet to judge whether a known vulnerability exists. The scanning tool adopts various vulnerability identification technologies, such as vulnerability detection based on feature code matching, protocol analysis based on a state machine, patch comparison analysis based on reverse engineering and the like, identifies security vulnerabilities such as system vulnerabilities, web vulnerabilities and weak passwords existing in the asset from multiple dimensions such as an operating system, application software, a database, network equipment and the like, and acquires attribute information such as names, numbers, types and hazards of the vulnerabilities to form a detailed vulnerability scanning report. For each scanned security vulnerability, comprehensively evaluating the dimensions of attack vectors, attack complexity, authority requirements, user interaction, influence range and the like by referring to a vulnerability scoring system CVSS common to the industry. Taking CVSSv as an example, the scoring formula is that the basic score=0.6×influence index+0.4×availability index-1.5×influence index×availability index, the CVSS score of the vulnerability is obtained through the calculation of the formula, and the harmfulness of the vulnerability is quantified. According to CVSS grading, security vulnerabilities can be classified into different risk levels, for example, grading more than 9 points is a serious risk, 7-8 points are high risk, 4-6 points are medium risk, 0-3 points are low risk, and the risk levels of the vulnerabilities are marked. And (3) integrating the vulnerability scanning result and the risk assessment information, generating a vulnerability risk list of the enterprise assets, determining the number of vulnerabilities, the types of vulnerabilities and the risk distribution conditions of each asset, sequencing according to the risk level, and determining vulnerabilities needing to be focused and repaired. On the basis, the priority order of bug repair is determined according to factors such as the risk level, the business influence degree, the repair difficulty and the like of the bug, and a staged bug repair plan is formulated. Finally, according to the bug repair priority, a targeted bug repair and protection scheme is formulated, such as deploying security patches, upgrading software versions, optimizing security configurations, deploying security protection equipment and the like, continuously tracking bug repair progress, periodically retesting, ensuring that security bugs are effectively repaired, and reducing security risks facing enterprises. When vulnerability scanning is performed, a Nmap tool can be used for performing TCPSYN scanning on a target IP address range, SYN data packets are sent to a target port, the port opening state is judged according to response conditions, the scanning rate can be set to 1000 data packets per second, and network topology and port information of a target asset can be rapidly identified. And then, adopting a Nessus and other specialized vulnerability scanning tools, aiming at services corresponding to an open port, such as Web services, database services, mail services and the like, utilizing a built-in vulnerability plugin library, analyzing response data by sending a specific detection request, and judging whether security vulnerabilities such as SQL injection, XSS cross-site script, remote command execution and the like exist. For example, for Web application, a matching algorithm based on regular expression can be used to detect whether the response contains specific error information, such as database error information, web container error information and the like, identify potential injection points, and for an operating system, version judgment can be carried out through Banner information, fingerprint characteristics and the like, POC verification scripts in a vulnerability knowledge base are utilized to simulate attack requests and confirm the existence of vulnerabilities. The scanning process can be divided into two stages of preliminary detection and depth detection, wherein the preliminary detection mainly identifies common high-risk vulnerabilities, the scanning time is controlled within 1 hour, the depth detection is further used for verifying the preliminary detected suspected vulnerabilities by utilizing the vulnerability exploitation codes, and tools such as Sqlmap, metasploit and the like are used for attempting to acquire system authorities to determine the availability of the vulnerabilities. The scanning result can be exported into a standardized XML format report, unified display and management are carried out through a self-defined developed vulnerability management platform, the platform is developed by using PythonDjango frames, the front end is realized by using Vue. Js, the vulnerability data are stored by using an elastic search, a CVSS scoring interface is called, risk scoring is carried out on each vulnerability, a vulnerability risk matrix diagram is formed, and the vulnerability distribution situation of each service system is intuitively presented. the vulnerability management platform can also count the number proportion of various vulnerabilities, such as the high-risk vulnerability proportion of 30%, the medium-risk vulnerability proportion of 50% and the low-risk vulnerability proportion of 20%, perform risk ranking on the assets in combination with an asset importance assessment algorithm, calculate the risk value of each asset, wherein the risk value=the asset importance×the vulnerability risk score×the number of vulnerabilities, and formulate a vulnerability repair plan according to the order of the risk values from high to low, and push the vulnerability repair plan to relevant operation and maintenance personnel for disposal. In the bug repairing process, secondary scanning verification should be performed to ensure that the patch is validated and the bug is eliminated.
The application ensures the comprehensive identification and accurate mapping of the enterprise network assets by acquiring the IP address, port number and service type information of the network equipment, and provides detailed basic data for subsequent risk assessment. Further, the enterprise network topology map drawn using this information provides an intuitive view for visualizing network structures and identifying potential security vulnerabilities, making security analysis more efficient and accurate. In addition, by combining applications of vulnerability scanning tools such as Nessus, nexpose and the like, the security vulnerability of the assets is detected, and the discovered vulnerabilities are quantitatively evaluated according to a CVSS vulnerability scoring system, so that scientificity and systemicity of vulnerability management are enhanced, and enterprises are ensured to be capable of preferentially processing vulnerabilities with the greatest threat to network security.
S02, drawing a system asset topological graph according to the network asset basic data, and identifying safety risk points existing in key system assets according to the system asset topological graph to obtain a system asset risk list;
As a preferred embodiment of the implementation one, the drawing a system asset topological graph according to the network asset basic data, and identifying security risk points existing in the key system asset according to the system asset topological graph to obtain a system asset risk list, specifically:
Aiming at the network asset basic data, a system asset topological graph is drawn by using ENTERPRISEARCHITECT architecture design tools through application dependency analysis and architecture analysis, the dependency relationship and the data flow direction between systems are determined, and safety risk points existing in key system assets are identified according to business importance assessment and risk assessment methods to form a system asset risk list.
And extracting the dependency relationship among services by analyzing the API call, the configuration file and the database connection information in the application code by adopting ApplicationInsight, dynatrace tools, and constructing a directed acyclic graph containing service nodes and dependency edges. According to the application dependency, using ENTERPRISEARCHITECT architecture design tools, a system asset topology is drawn from multiple dimensions of service view, application view, data view and technology view, and deployment relationship, interface relationship and data flow between system components are presented. On the basis of identifying key assets in the running process of the system, adopting a STRIE threat modeling method to identify security threats existing in the system architecture from six dimensions, and calculating the risk value of the assets. Aiming at the risk assessment result, adopting a micro-service architecture and containerization to realize logic isolation among different services, implementing desensitization and encryption protection on sensitive data, adopting high-availability architecture design and remote multi-activity deployment, and improving disaster tolerance capability of the system. In the implementation of the framework safety reinforcement process, a normal behavior baseline is established by using a machine learning algorithm through a log analysis and flow analysis means, an abnormal deviation mode is identified, and the running state of the system is monitored in real time.
Specifically, aiming at network asset basic data, by applying a dependency analysis technology, adopting ApplicationInsight, dynatrace and other tools, comprehensive dependency relation carding is carried out on an application system of an enterprise, by analyzing information such as API call, configuration file, database connection and the like in application codes, dependency relation among services is extracted, meanwhile, a Directed Acyclic Graph (DAG) comprising service nodes and dependency edges is constructed by combining with actual call chain data collected by APM and other monitoring tools, so that an application dependency topology matrix is formed, and a foundation is laid for subsequent architecture analysis. Based on application dependency analysis, a system architecture is subjected to visual modeling by utilizing a ENTERPRISEARCHITECT architecture design tool, and a plurality of dimensions such as a service view, an application view, a data view, a technical view and the like are utilized to draw a system asset topological graph, so that a deployment relationship, an interface relationship and a data flow direction among system components are clearly presented, the system architecture is comprehensively understood, and potential architecture defects and risk points are found. According to the system asset topological graph, key assets in the running process of the system are identified, the key assets comprise core business applications, key databases, important middleware and the like, business importance is evaluated, and importance scores of the key assets are calculated from the aspects of business income, customer influence, regulatory compliance and the like by adopting a qualitative and quantitative combination mode to form an importance matrix of the key assets. Based on the identification of the key system assets, a risk assessment method is adopted to comprehensively analyze the security risks faced by the system assets. Using STRIE threat modeling method, the security threat existing in the system architecture is identified from Spoofing, tampering, repudiation, informationDisclosure, denialofService and ElevationofPrivilege dimensions, and the risk value of the asset is calculated from the aspects of vulnerability of the asset itself, the possibility of threat utilization, the influence degree of the security event, etc. The calculation formula of the risk value is that the risk value = occurrence probability x influence degree, wherein the occurrence probability and influence degree can give a quantization score of 1-5 points according to the characteristics of the risk factors, and a 5×5 risk matrix is formed. The risk value in the matrix is greater than or equal to 15 points and is high, 8-14 points are medium risk, and less than 8 points are low risk, so that a risk treatment scheme needs to be formulated in a targeted mode. And aiming at the result of risk assessment, combining the business requirement and the safety requirement, and formulating a targeted safety reinforcement and protection scheme. The method adopts a micro-service architecture and a containerization technology to realize logic isolation among different services, and uses a service grid (SERVICEMESH) technology to realize authentication and encryption communication among services through a Sidecar proxy. The method is characterized in that desensitization and encryption protection are carried out on sensitive data, various technical means such as hash, mask and encryption are adopted, and the method is carried out in various links such as data acquisition, transmission, storage, processing and application. By adopting high-availability architecture design and off-site multi-activity deployment, the disaster recovery capability of the system is improved by combining the mechanisms of load balancing, data synchronization, fault switching and the like through deployment modes of multi-site multi-activity, off-site multi-center and the like. Meanwhile, active defense is implemented on key business and data, and potential threats are discovered and blocked in time. In the process of implementing architecture security reinforcement, security monitoring and risk assessment are continuously carried out, and the running state of the system is monitored in real time through technical means such as log analysis and flow analysis. The method comprises the steps of establishing a normal behavior baseline according to historical data by using a machine learning algorithm such as IsolationForest, oneClassSVM, LOF in unsupervised learning and identifying an abnormal deviation mode, training a classification model by using a supervised learning algorithm such as SVM, randomForest and XGBoost and the like through labeling known abnormal data, carrying out abnormal judgment on new data, and carrying out trend prediction and abnormal detection on time sequence data of a system index by using a time sequence analysis algorithm such as ARIMA, prophet and LSTM. In the application dependency analysis process, a APPDYNAMICS platform can be used, and by injecting probes into application program codes, performance data and topology dependency relations of key business transactions such as method call, database query, message queue and the like are collected in real time. If HTTP request and response in the application are captured, information such as URL, parameter and response status code is extracted, call relation among services is judged, execution condition of SQL sentences is analyzed, indexes such as database read-write times and time consumption are counted, and potential performance bottleneck is found. Meanwhile, a statistical algorithm such as correlation analysis, frequent item set mining and the like is utilized to identify association rules among services from massive transaction logs, and a service dependency graph is formed if the probability of calling service B by service A is 80%, the influence of response time delay of service C on service D is the largest and the like. In architecture security risk assessment, FAIR (FactorAnalysisofInformationRisk) framework may be employed to quantify risk from both Threat Event Frequency (TEF) and Loss Expectation (LEF) dimensions. Firstly, estimating the occurrence probability of a specific type of threat according to historical security event data and threat information to obtain the frequency of the annual threat event. The direct economic and indirect losses that may be incurred upon the occurrence of a threat event are then evaluated, and a single event loss expectation is calculated. Finally, substituting the TEF and LEF values into a risk calculation formula, wherein risk=TEF×LEF, and obtaining an annual risk value. If the occurrence frequency of the annual events is 2 times, and the economic loss caused by a single event is 50 ten thousand yuan, the annual risk value is 100 ten thousand yuan, and the data leakage event belongs to a high risk level, and the data desensitization, the access control and other measures need to be preferentially adopted for prevention and control. In implementing micro-service architecture security enforcement, a service grid platform such as Istio may be used to provide fine-grained flow control and security protection capabilities by deploying Sidecar proxy containers in the Kubernetes cluster, taking over inter-service network traffic. If bidirectional TLS authentication is started, communication between services is encrypted to prevent data from being intercepted, access control rules based on roles are set on service consumers to strictly limit access of unauthorized services, and availability and stability of the services are improved through fusing, current limiting, degradation and other mechanisms. Meanwhile, by utilizing the observability characteristic of the service grid, service call indexes such as QPS, delay, error rate and the like are monitored in real time, and visual display and alarm are realized by using Prometheus, grafana tools and the like. In the aspect of data security protection, format Preserving Encryption (FPE) algorithm can be adopted to perform equal-length replacement on sensitive data, so that the desensitized data still meets the format requirement of business application. If the mobile phone number is desensitized, the regular expression "\d3\d4\d" can be used for matching the mobile phone number format, then an AES algorithm is used for generating a random number with a corresponding length for replacement, and the replaced mobile phone number is still an 11-bit number, but is not a real number, so that privacy leakage can be effectively prevented. When the abnormal behavior detection algorithm is selected, a real-time data processing frame based on SparkStreaming can be adopted to analyze mass data such as system logs, network traffic and the like in real time. And a DATAFRAME, DATASET-like distributed data structure is used in the Streaming operation, so that data conversion and statistical analysis are convenient. An isolated forest is constructed by using IsolationForest algorithm, a plurality of isolated trees are constructed by recursively randomly dividing data on attributes, and then the average path length of sample points in each tree is calculated, wherein the shorter the path length is, the higher the anomaly score is. When the anomaly score exceeds a set threshold, such as 0.6, an anomaly event is determined and an alarm is triggered. Meanwhile, the LSTM and other deep learning algorithms are utilized to model time sequence data of system indexes such as CPU utilization rate, memory occupancy rate and the like, and the accuracy and the instantaneity of anomaly detection are continuously improved through model training and parameter tuning. If the Split-BrainAutoencoder model is used, an LSTM layer and a plurality of full connection layers are used in the encoder, the symmetrical structure is used for reversely restoring input data in the decoder, the degree of abnormality of error measurement data is reconstructed, and when the error exceeds a normal value by 3 times of standard deviation, an abnormal time point is determined.
In the preferred embodiment, the application dependency analysis tools ApplicationInsight and DYNATRACE are used to deeply analyze API calls, configuration files and database connection information in application program codes, so that the dependency relationship among services is extracted, and a solid foundation is provided for understanding how system components interact. And then, a system asset topological graph showing the data flow direction is drawn by combining ENTERPRISEARCHITECT architecture design tools, network asset basic data and the dependency relationships, so that the deployment relationship, interface relationship and data flow direction of each component in the system are clear at a glance. On this basis, assets playing a key role in the operation of the system are identified, and security threat analysis is performed by using a STRIE threat modeling method, so that potential risk points of the assets are systematically identified. Finally, by summarizing and evaluating the potential risk points, a system asset risk list is formed, so that not only is the risk condition of the asset clear, but also a basis is provided for subsequent risk management and relief measures, and the control capability of enterprises on the safety of the key system asset is remarkably improved.
And S03, drawing a data asset flow chart according to the system asset topological graph, and identifying safety risks existing in the sensitive data asset according to the data asset flow chart to obtain a data asset risk list.
As a preferred embodiment of the first embodiment, the drawing a data asset flow chart according to the system asset topology chart, and identifying security risks existing in the sensitive data asset according to the data asset flow chart, to obtain a data asset risk list, specifically:
and drawing a data asset flow chart by adopting a data flow chart DFD method aiming at the data flow, using a Visio flow chart tool, determining the circulation path and access condition of the data in the enterprise, and identifying the security risk existing in the sensitive data asset according to the data classification and risk assessment method to form a data asset risk list.
And drawing a data asset flow chart in the enterprise by adopting a data flow chart DFD method aiming at the data flow and through a Visio flow chart tool to obtain the circulation paths of the data among different business systems, departments and staff. According to the data circulation path, referring to a common data classification method, classifying the data assets according to data sources, confidentiality requirements and importance degree dimensions. Data assets are classified into different security protection levels, public, internal, sensitive and confidential levels by comparison to legal regulations, industry standards and enterprise policy requirements. And analyzing the access condition of the data asset in the circulation process according to the data flow graph and the data classification and grading result. And (3) evaluating the occurrence and influence degree of each risk point by identifying the security risk points of data leakage, unauthorized access and data tampering existing in each link, and calculating to obtain the risk value of the data asset. For the identified data security risk points, a common qualitative and quantitative assessment method in the field of OCTAVE and FRAP information security is used as a reference. By evaluating from both the risk occurrence and the degree of influence, it is determined whether the risk level exceeds an acceptable threshold. If a high risk link of a sensitive data asset is involved, a specific security protection scheme is formulated from a data full lifecycle perspective. And according to the characteristics of each stage of data acquisition, transmission, storage, access, processing and destruction, adopting corresponding safety protection measures.
Specifically, for the direct data flow relation of different systems in the system asset topological graph, a data flow graph DFD method is adopted, flow chart tools such as Visio and the like are used for drawing the data asset flow graph in an enterprise, the circulation paths of data among different business systems, departments and staff are clear, and key nodes such as the source, the destination and the processing process of the data asset are identified through the analysis of the data flow, so that a foundation is provided for the subsequent data security risk analysis. In the process of drawing a data flow graph, a structured mode is adopted to comb data assets, common data classification methods are referred, such as classification is carried out based on dimensions such as sources, confidentiality requirements and importance degrees of data, and the data are classified into different security protection levels such as public levels, internal levels, sensitive levels and confidential levels according to requirements of laws and regulations, industry standards and enterprise policies, management and control requirements of various data assets are defined, a data classification framework of an enterprise is formed, and basis is provided for making a data security policy. According to the data flow graph and the data classification and grading result, analyzing the access condition of the data asset in the circulation process, identifying the possible safety risk points of data leakage, unauthorized access, data tampering and the like in each link, evaluating the possibility and influence degree of each risk point, and calculating the risk value of the data asset. And representing the high risk by adopting a color grade, representing high risk by red, medium risk by yellow and low risk by green, representing the possibility of risk occurrence by the size of the risk points, displaying detailed information of each risk point by hovering a mouse, finally forming a data asset risk thermodynamic diagram, and visually presenting a high risk area. For the identified data security risk points, qualitative and quantitative evaluation methods commonly used in the field of information security, such as OCTAVE, FRAP and the like, are used for evaluating the risk from two dimensions of the possibility and the influence degree of risk. And qualitatively evaluating and scoring according to 1-5 grades by means of risk factor questionnaires, brain storms and the like to form a risk matrix diagram. The quantitative evaluation uses an annual loss expected value (ALE) calculation formula ale=Σ (asset value x risk occurrence probability x vulnerability exposure coefficient), and the risk priority is ordered according to the height of the ALE value. If the risk level exceeds the acceptable threshold, corresponding safety protection measures need to be formulated, and if the risk level is lower, general management and technical control measures can be adopted according to the cost-effectiveness principle. For high risk links involving sensitive data assets, specific security protection schemes are formulated from a data full lifecycle perspective. In the data acquisition stage, the sensitive data is desensitized by using the technologies of data masking, data pseudonymization and the like, and the original information is hidden. And in the data transmission stage, encryption protocols such as SSL/TLS and the like are adopted to prevent data from being stolen in the network transmission process. And in the data storage stage, the static data security is protected by using transparent data encryption, column-level encryption and other technologies. In the data access stage, methods such as role-based access control, minimum authority principle and the like are adopted to strictly limit the access authority of the data. In the data processing stage, privacy protection technologies such as multiparty security calculation, homomorphic encryption and the like are adopted, so that data analysis and mining are realized while the confidentiality of data is protected. In the data destruction stage, the modes of repeated overwriting, physical crushing and the like are adopted to ensure that the waste data cannot be recovered. Meanwhile, through technical means such as a data leakage prevention DLP system and database audit, real-time monitoring and audit are carried out on the access behavior of sensitive data, and abnormal operation is found and blocked in time. Based on the data asset risk assessment, summarizing the identified various data security risks, and generating a data asset risk list of the enterprise by combining factors such as risk level, influence range, correction difficulty and the like. When the risk treatment plan is formulated, different risk treatment strategies such as risk avoidance, risk alleviation, risk transfer, risk acceptance and the like are formulated by combining factors such as cost benefit, technical feasibility, business influence and the like of risk response. Meanwhile, data security risk assessment and audit are carried out regularly, new data security risks are continuously identified and assessed, a data security protection strategy is optimized, and security and controllability of data assets are ensured. In drawing a dataflow graph, microsoftVisio tools may be used to trace the path of data through the enterprise by dragging dataflow graph elements, such as external entities, data flows, processes, and data stores. The data transmission process from the service system a to the database B and then to the application C uses the arrowed connection line to represent the data flow direction, and marks the data content, the flow size and other attributes. For complex data flow diagrams, a hierarchical drawing mode can be adopted, firstly, a top-Level overview chart (Level 0) is drawn, and then, the top-Level overview chart is refined to a sub-process chart (Level 1/2/3) layer by layer. In the classification and grading of data, the data can be classified into three protection levels of high, medium and low according to three dimensions of confidentiality, integrity and availability of the data by referring to NISTSP, 800-53 and other standards. The private data such as the identification card number, the bank card number and the like of the customer belong to high confidentiality, the business data such as order amount, stock quantity and the like belong to medium integrity, and the public data such as company news, product manuals and the like belong to low availability. And adopting a decision tree algorithm, and classifying each item of data step by setting a series of judging conditions to finally form a data classification matrix. In the risk assessment process, a STRIE threat modeling method can be used for identifying security threats faced by data from six dimensions of deception, tampering, denial of service, information leakage, denial of service and privilege elevation. If the account number of the database manager is stolen, the method belongs to privilege elevation threat, and if the mail containing sensitive information is missent by staff, the method belongs to information leakage threat. And (3) scoring by security specialists according to the occurrence Probability (Probability) and the influence degree (Impact) of each threat by adopting a qualitative assessment method to form a 5x5 risk matrix, wherein score 1 represents VeryLow, score 2 represents Low, score 3 represents Medium, score 4 represents High and score 5 represents VeryHigh. In combination with the importance Weight (Weight) of the data asset, the risk value (RiskScore) faced by each item of data is calculated, wherein riskscore=probability×image×weight. For sensitive data with high risk, a format retention encryption (FormatPreservingEncryption, FPE) algorithm can be adopted to perform equal-length substitution on the structured data and maintain the original data format when the data is desensitized. In the aspect of data encryption, a national cipher SM4 block cipher algorithm can be adopted, data are encrypted in a block mode by using a 128-bit key, each block is 128 bits in length, the encryption process comprises 32 rounds of iteration, nonlinear transformation, linear transformation, round key addition and other operations are respectively carried out in each round, and differential attack and linear attack can be effectively resisted. During data security audit, a rule-based detection engine can be used for periodically scanning database operation logs, and suspicious behaviors can be found through preset audit rules (such as sensitive table access, large-volume data derivation and the like). Meanwhile, a machine learning algorithm is used for establishing a user behavior baseline, and abnormal behavior patterns are identified through models such as clustering and classification. If a K-Means clustering algorithm is used, user behaviors are classified into normal and abnormal according to the duration, access data amount, operation type and other characteristics of a database session, and when the behaviors of a certain session deviate from the center point of the normal class by more than a set threshold (such as 2 times of standard deviation), the abnormal is judged, and a safety alarm is triggered. When the data security policy is formulated, the risk minimization principle is followed, strict management and control measures such as forbidding outsourcing, forced encryption, frequent audit and the like are adopted for high-risk data, and relatively loose policies can be adopted for low-risk data, so that the service flexibility and the security compliance are considered.
In the preferred embodiment, the present application outlines the path of data flow in the system by tracking the start and end points of the data flow. With this flow path information, a dataflow graph (DFD) approach is used to detail the dataflow graph, which visually illustrates the general view of the data flow. The data flows in the flow graph are then classified and ranked, identifying abnormal data flows that may be predictive of potential security risks. These abnormal data flows are subjected to in-depth risk assessment to identify possible security risk points during data streaming. And finally, summarizing all the identified security risk points to form an exhaustive data asset risk list. The risk list records specific information of each security risk point in detail, including the positions of the security risk points in the data asset flow chart and potential security threats, provides a clear data security risk management view for enterprises, and is beneficial to the enterprises to take targeted data protection measures and enhance data security.
S04, according to a preset big data analysis method, identifying occurrence processes and influence ranges of all risk events on a system asset risk list and a data asset risk list to obtain a risk event analysis report.
In a preferred embodiment of the first embodiment, according to the preset big data analysis method, the occurrence process and the influence range of each risk event are identified on the system asset risk list and the data asset risk list, so as to obtain a risk event analysis report, which specifically includes:
acquiring risk event information including intrusion detection, virus infection and data leakage existing in enterprise assets according to mapping results of a system asset risk list and a data asset risk list, acquiring and classifying the risk events, analyzing by utilizing big data, mining attack means and attack paths behind the events, tracking occurrence processes and influence ranges of the events, and forming a risk event analysis report.
As shown in fig. 3, the data packet, the traffic and the session information of the network layer are collected in real time by deploying an intrusion detection system, an antivirus software and a data leakage prevention system security monitoring device. And identifying suspicious security events by using a regular expression matching and feature code detection method according to the session information. And preprocessing and extracting features of the security event data by adopting a big data processing platform comprising Hadoop and Spark and a MapReduce and RDD parallel computing model. Noise data is filtered through data cleaning, data conversion and data reduction ETL operation, and unstructured data is converted into a structured form. The structured security event data is intelligently analyzed from multiple dimensions using machine learning algorithms. Through cluster analysis, event sets of similar attack activities are identified. The association rule mining algorithm comprises Apriori and FP-Growth, and association rules among event attributes are found. And (3) carrying out correlation analysis on security events detected in the enterprise and external malicious IP, domain name, sample and attack manipulation threat information data through docking with a threat information platform, tracking event tracing, and judging the influence degree and hazard range of the event.
Specifically, according to mapping results of a system asset risk list and a data asset risk list, risk event information existing in enterprise assets is obtained, information such as data packets, traffic, sessions and the like of a network layer is collected in real time by deploying security monitoring equipment such as an intrusion detection system, anti-virus software and a data anti-leakage system, information such as processes, files, registries and account numbers of a host layer, information such as user behaviors, business operations and abnormal errors of an application layer, and threat information data such as external malicious IP (Internet protocol), domain names, samples and attack methods are collected. Suspicious security events are identified by using methods such as regular expression matching, feature code detection and the like, and the events are automatically classified according to threat level, attack stage, target asset and other attributes of the events to form a structured event data set. And preprocessing and feature extraction are carried out on the collected massive safety event data by adopting a big data processing platform such as Hadoop, spark and the like and utilizing parallel computing models such as MapReduce, RDD and the like, noise data is filtered through ETL operations such as data cleaning, data conversion, data protocol and the like, unstructured data is converted into a structured form, key attributes which can most represent event features are selected through a feature selection algorithm, and a data set suitable for mining analysis is constructed. On a big data analysis platform, intelligent analysis is performed on the security events from multiple dimensions using machine learning algorithms. The method comprises the steps of adopting common event clustering dimension and distance measurement methods, such as clustering according to attack types, converting types into 0-1 vectors by adopting One-Hot coding according to the attack types of events such as scanning, injection, vulnerability utilization and the like, measuring similarity among the events by using Euclidean distance, clustering according to attack sources, expressing the similarity of the IP by using the numerical value of the IP address according to the IP address of the attack sources of the events, measuring the similarity of the IP by using cosine distance, clustering according to target assets, carrying out hierarchical vector expression on the asset types according to the target asset types of the events such as a server, a database, an application system and the like, and measuring the similarity of the assets by using Jaccard distance. Through cluster analysis, event sets of similar attack activities are identified. For each type of security event, an association rule mining algorithm such as Apriori, FP-Growth and the like is adopted, association rules among event attributes are found, for example, certain types of attacks usually cause specific system anomalies, then causal relations among events are judged, and an occurrence path of an attack chain is deduced. The method comprises the steps of adopting a heterogeneous network analysis method based on a graph to extract an attacker IP, an attack event and a victim asset in an event record as nodes of the graph respectively, setting different node type attributes, establishing directed edges among related nodes according to information such as time stamps, event types, attack methods and the like in the event record, describing association relations and sequences among the nodes, setting weight attributes on the edges, and giving the edges weight by the danger degree, the occurrence frequency and the like of the event to represent association strength. The complex association between event nodes is stored through graph databases such as Neo4j and JanusGraph, and the importance and the correlation of the attack events in the network are calculated by using graph algorithms such as PageRank and shortest paths, and the images, attack capability, attack preference and the like of the attackers behind the events are revealed. In the event analysis process, the security event detected in the enterprise is subjected to association analysis with threat information data such as an external malicious IP (Internet protocol), a domain name, a sample, an attack manipulation and the like by docking with a threat information platform, the source tracing of the event is tracked, and the influence degree and the hazard range of the event are judged. And (3) integrating internal and external security event data by adopting a situation awareness technology, and describing the overall network security threat situation faced by the enterprise. According to the event analysis result, a multi-dimensional security event analysis report is output, the attack type distribution, attack source distribution, attack means statistics and the like suffered by an enterprise within a period of time are presented from a macroscopic view, the overall trend of security threat is revealed, the detailed process of a major security event including the attack chain, the influence range, the loss evaluation, the disposal measures and the like of the event is presented from a microscopic view, so that security management personnel can conveniently and comprehensively grasp the security event condition, and guide the optimization of a security protection strategy. And establishing a real-time safety event monitoring and early warning mechanism, deploying an event analysis model to a stream processing engine such as a Flink, a Storm and the like, and carrying out real-time calculation on newly acquired safety event data. The method comprises the steps of calculating the deviation degree of samples from a mean value by adopting a statistical anomaly detection algorithm such as Z-Score and MAD, judging whether the samples are abnormal or not, finding abnormal points in a high-dimensional event space by adopting a high-dimensional anomaly detection algorithm such as PCA and KNN through dimension reduction or neighborhood analysis, and finding abnormal time points of time sequence data by adopting an anomaly detection algorithm such as S-H-ESD and ARIMA through analysis of the period, trend, residual error and the like of the time sequence data. The method comprises the steps of analyzing user behavior modeling, counting behavior modes of users such as login time, operation frequency, resource use and the like, constructing user portraits, dividing different user groups by adopting a clustering method and the like, taking the overall characteristics of the groups as the standard of anomaly judgment, discovering association rules and sequence modes among the operations of the users at different time points by mining the time sequence of the user behaviors, and judging whether the behavior track of the users is abnormal or not based on the frequent modes and the association rules. According to a preset threshold rule, safety alarms of different grades are automatically triggered, and safety operation and maintenance personnel are informed to carry out emergency treatment in a mode of mail, short messages, worksheets and the like, so that loss caused by a safety event is reduced to the greatest extent. Meanwhile, event feature engineering and machine learning algorithms are continuously optimized, and timeliness and accuracy of event detection and early warning are improved. When collecting security event data, a Snort intrusion detection system can be used, malicious traffic in a network is detected and alarmed in real time by defining a rule base such as 'ALERTTCPANYANY- > 192.168.1.0/2480', a Splunk log management system is adopted, various log data are collected from network equipment, an operating system, an application system and the like through a Forwarder component, and the daily collection log quantity can reach 50GB. And extracting user names and the like by adopting regular expressions, and identifying frequent abnormal access and sensitive user operation by counting the frequencies of the IP addresses and the user names. And storing and processing massive logs by using a Hadoop distributed platform, carrying out ETL cleaning on the structured logs through HiveSQL, carrying out word segmentation, stop word removal and other processing on unstructured texts through a MapReduce program, and extracting more than 20 characteristic fields such as attack time, attack type, attack source, target asset and the like. And adopting a PCA principal component analysis algorithm, and selecting the first k features with the feature vector accumulated variance contribution rate greater than 95% as an event feature set. During event cluster analysis, performing unsupervised clustering on events by using a K-Means algorithm, dividing the events closest to the event feature vectors into the same cluster by calculating Euclidean distance between the event feature vectors, setting the cluster number of k=5, and running for 10 times to obtain an optimal clustering result, wherein the maximum iteration number of max_iter=100. For attack source IP clustering, converting the IP into a 32-bit integer to represent, reducing the influence of excessive numerical difference on the integer logarithm, then calculating cosine similarity between IP vectors, and dividing the IP with the cosine value larger than 0.8 into the same cluster. Using Apriori association rule mining algorithm, discovering association rules between attack events, such as ' vulnerability scanning event & ' brute force cracking event- > remote control event (sup=0.08, conf=0.85) ' with minimum support degree min_sup=0.05, minimum confidence degree min_conf=0.8, maximum frequent item set max_len=5, and revealing typical penetration attack link. When an event map is constructed, the IP, the port, the attack type, the attack time, the attack target and the like of an attack event are extracted as nodes of the map, the nodes connected with the same event form a complete attack path, the attack severity is set as node weight, the high-risk attack weight is reset to 1.0, the medium-risk attack weight is reset to 0.6, the low-risk attack weight is reset to 0.3, and the centrality and the near centrality of the nodes reflect the importance degree of the nodes in the attack. And calculating importance weights of each node according to the link relation among the nodes by using a PageRank iterative propagation algorithm, wherein the damping coefficient d=0.85, the maximum iteration number max_iter=100, and the node of the weight ranking top10 serves as a key attack node. And calculating the shortest attack path among the key nodes through a shortest path algorithm, revealing the shortest attack path from initial penetration to final control target, and taking the path length as a measure of attack complexity. When threat information association analysis is performed, IOC indexes such as domain names, IP and sample Hash related to an event are submitted to an open threat information library such as VirusTotal, alienVault through a threat information inquiry API, malicious scores of the IOCs, threat information labels such as affiliated CCs and attack organizations are obtained, and the IOCs with scores greater than 7 are judged to be high-risk threats. Combining the information label with the clustering analysis result, counting threat level distribution of various events, and dividing attack events according to high, medium and low dangers to form a network threat situation distribution map. when abnormal behavior is detected, an LSTM neural network algorithm is used, a user behavior sequence of the last 30 days is taken as input, 100 LSTM neurons are used in the middle layer, a training model predicts the operation sequence of a user, and the sequence prediction error exceeds 2 times of standard deviation to be regarded as abnormal. And the SHAP technology is used for explaining the main basis of abnormal behavior judgment, top5 behavior characteristics with the largest contribution degree are extracted, a user abnormal behavior pattern library is generated, and interpretable abnormal detection is realized.
In the preferred embodiment, the present application obtains risk events existing in the enterprise assets according to the system asset risk list and the data asset risk list by using a preset big data analysis technology, and these events cover key security problems such as intrusion detection, virus infection, data leakage, etc. The risk events are then systematically collected and classified, ensuring that each event is accurately identified and archived. Then, the big data analysis technology is used for deep mining, and attack means and attack paths behind each risk event are identified, wherein the step is realized by analyzing patterns and abnormal behaviors in event data. Finally, based on these means of attack and paths, the occurrence and scope of influence of each risk event is identified in detail, including how the event began, how it propagated, and its specific impact on the enterprise asset. Through the coherent analysis process, a comprehensive risk event analysis report is finally formed, the report records the detailed information of the event and the influence on the enterprise safety in detail, and decision support of risk management and response is provided for the enterprise.
S05, constructing a causal chain of each risk event according to the risk event analysis report to obtain a risk source of each risk event.
In a preferred embodiment of the first embodiment, according to the preset big data analysis method, the occurrence process and the influence range of each risk event are identified on the system asset risk list and the data asset risk list, so as to obtain a risk event analysis report, which specifically includes:
And (3) adopting causal reasoning, and reasoning fundamental factors of event occurrence according to a risk event analysis report, constructing a causal chain of event occurrence by combining asset vulnerabilities, system vulnerabilities and data flow direction factors, and determining the position and importance degree of risk points in the causal chain to obtain the root of the risk.
By adopting a root cause analysis method, a structured causal reasoning framework is formed through the processes of identifying problems, collecting data, making assumptions, verifying assumptions and identifying factors to corrective measures. And comprehensively combing various vulnerability factors of technical loopholes, management defects and process omission existing in the enterprise information system according to the causal reasoning framework, and identifying key risk points. By carrying out full life cycle flow tracking on the data assets of the enterprise, the circulation path and the service condition of each link of data acquisition, transmission, storage, processing and exchange are defined. And adopting an attack graph modeling method to correlate the data flow path with the risk factor nodes identified in the use condition in the form of an attack path. All attack paths are enumerated by ModelChecking and logical reasoning to form a complete causal chain for risk event occurrence. And quantifying the severity of the vulnerability from the angles of attack vectors, attack complexity and authority requirements by adopting a CVSS (compound visual system), and evaluating the hazard degree of each risk point. And calculating the importance of each risk point in the attack path according to the severity of the vulnerability, and determining the key risk points. According to the causal chain analysis and the risk point assessment result, a risk traceability report is automatically generated, the root factors, key risk points and vulnerability factors of the occurrence of the risk event are defined, and the risk level, the hazard score and the treatment priority quantification index of the risk points are given.
Specifically, through a causal reasoning technology, the critical risk events identified in the risk event analysis report are subjected to deep cause tracing and analysis. The root cause analysis method is adopted, the flow of 'recognition problem-data collection-making assumption-assumption verification-cause recognition-corrective measures' is followed, deep causes of occurrence of events are explored layer by layer until the cause of the most primitive is found out, a structured causal tree is formed, and the logical relationship of the causes and the consequences of occurrence of risk events is displayed. In the causal reasoning process, various vulnerability factors such as technical loopholes, management defects, flow omission and the like existing in an enterprise information system are comprehensively combed, wherein the vulnerability factors comprise software and hardware loopholes of the system, weak points of network architecture, dead zones of safety protection, insufficient management systems, lack of personnel safety consciousness and the like, the association degree and influence paths between the factors and risk events are evaluated, and the action of the factors in the event occurrence is judged. The flow direction tracking of the whole life cycle is carried out on the data assets of the enterprise, the circulation paths and the service conditions of sensitive data in the links of acquisition, transmission, storage, processing, exchange and the like are clarified, leakage risk points in the data circulation process, such as unauthorized data access, data channels lacking safety protection, too wide data sharing range and the like, are identified, and the causal relationship between weak links and risk events is analyzed. On the basis of risk factors identified by multiple dimensions such as asset vulnerabilities, system vulnerabilities, data flow directions and the like, an attack graph modeling method is adopted to associate factor nodes in the form of attack paths. The basic components of the attack graph include nodes representing system states or vulnerabilities, edges representing attack behavior. By utilizing ModelChecking, logical reasoning and other technologies, all possible attack paths are enumerated to form a complete causal chain of occurrence of the risk event, a complete utilization path from an initial vulnerability point to a final hazard result is reflected, and the propulsion effect of the risk factors in the event occurrence is intuitively displayed. And quantitatively evaluating the event cause and effect chain, and calculating the importance degree of each risk point in the attack path. And the severity of the vulnerability is quantified from the attack vector, the attack complexity, the authority requirement and the like by adopting a CVSS (universal vulnerability scoring system). The time, money, manpower and other attack costs required by the attacker to finish a certain attack step are estimated, and the possibility that the attacker successfully implements the certain attack step is estimated. And then calculating the risk value of each node in the attack graph according to the formula of node risk value = node attack cost x node attack probability x node vulnerability CVSS score, and accumulating the node risk values on each complete attack path to obtain the path risk value, so as to judge the key nodes on the causal chain, namely the risk points which have the greatest influence on the event and are most required to be processed preferentially. According to the causal chain analysis and the risk point assessment result, a risk tracing report is automatically generated, the root cause, the key risk point, the vulnerability factor and the like of the occurrence of the risk event are defined, and the risk level, the hazard score and the disposal priority equalization index of the risk point are given. aiming at different types of risk sources, targeted safety reinforcement suggestions such as vulnerability restoration, authority convergence, boundary protection, process optimization, consciousness culture and the like are provided, so that a scientific and efficient risk closed-loop treatment process is formed. And feeding back the result of the risk tracing analysis to a safe operation flow, verifying and evaluating the integrity of a causal chain and the judgment accuracy of key risk points in the modes of attack and defense countermeasure exercise and the like, continuously optimizing a causal reasoning model, improving the reliability and practicality of the risk tracing analysis, and reducing the possibility of risk occurrence from the source according to the result of the risk root analysis. When the root cause analysis of the risk event is performed, various possible factors of the event occurrence can be classified into different factor branches such as people, machines, materials, methods, rings and the like by adopting a fish bone map (FishboneDiagram), and then deep exploration is performed along each branch. For example, for a network intrusion event, on the branch of "people" there may be factors related to insufficient security consciousness of management personnel, improper operation of staff, proficiency of attacker, etc., on the branch of "machines" there may be factors related to unrepaired system loopholes, missing protective equipment, insufficient monitoring capability, etc., and on the branch of "law" there may be factors related to lack of effective security policy, imperfect emergency plan, insufficient flow control, etc. By checking and evaluating each branch factor one by one, the key cause node is identified, and then each node is recursively subjected to deeper causal reasoning to form a complete event cause chain. As for the finally located critical bug, the formation cause of the bug needs to be further mined, which is caused by inherent defects of software codes or negligence of system configuration, until the root cause of the problem is traced. In the vulnerability analysis process, the security vulnerability of the asset can be systematically evaluated from six dimensions of disguise (Spoofing), tamper (TAMPERING), denial (Repudiation), information disclosure (InformationDisclosure), denial of service (DenialofService), privilege elevation (ElevationofPrivilege) using a STRIE threat modeling method. If an asset-threat-vulnerability corresponding matrix is adopted, threat scenes possibly faced by each type of asset are analyzed one by one, and weak links of the asset in the scenes are judged. taking a database system as an example, in the dimension of S, the authority setting of a database account can be improper, so that a user with low authority can be disguised as an administrator identity, in the dimension of T, the database can lack encryption measures, so that sensitive data is illegally tampered, and in the dimension of I, the audit function of a database log can be lost, so that a data leakage event can not be traced. And quantifying the risk level of each vulnerability by 1-5 scores to obtain the vulnerability risk score of each type of asset, and judging the weakness degree of the asset according to the vulnerability risk score. When the attack graph is generated, a MulVAL (Multihost, multistageVulnerabilityAnalysis) logical reasoning system can be used, and a network topology graph, host configuration information, known vulnerabilities and the like are taken as input to automatically analyze all possible attack paths. MulVAL adopts a Datalog clause form to represent the causal relationship between the attack preconditions and the attack consequence states, and then utilizes a logical reasoning engine to recursively calculate all the consequence states which can be achieved by the initial conditions. If the condition clause vulExists (webServer, VE) indicates that the Web server has a vulnerability VE, the condition clause hasAccount (attacker, webServer, root) indicates that an attacker obtains root rights on the Web server, and causal links exist between the two. Through iterative reasoning, a complete attack path which is shaped as 'VE vulnerability → root authority → intranet penetration → data theft' can be identified, and the progressive relationship of each weak point in the attack process is revealed. In quantitative risk assessment, factors such as asset importance, attack frequency, influence range and the like can be introduced besides common CVSS indexes, and a multi-dimensional security risk measurement model is constructed. Taking the importance of the asset as an example, the hierarchical analysis AHP can be adopted according to factors such as the value of the asset, the sensitivity of the data and the like, and the importance degree of different assets is given with a weight value of 1-10 points. And combining the asset importance weight, the attack cost, the attack probability, the vulnerability CVSS score and the like of each attack node to establish a function relation of the attack_risk (node) =asset_weight (node) × occur _prob (node) × CVSS _base (node)/the attack_cost (node), and calculating the risk value of each node, wherein the higher the node risk value is, the higher the importance degree of the node in an attack chain is. On the basis, the risk value of each complete attack path can be further analyzed, a factor graph (FactorGraph) probabilistic reasoning model is used for solving the joint probability distribution of path_risk (path) = pi node e pathattack _risk (node), so that topN paths most likely to be utilized by an attacker are judged, and defensive resources are deployed in a targeted manner.
In a preferred embodiment of the first embodiment, the application further acquires internal and external data related to the risk event through multi-channel data acquisition, wherein the internal and external data comprise threat information, a vulnerability library and a security community, and performs association analysis on the acquired data and enterprise internal data by utilizing data fusion, so as to expand the background information and the influence range of the risk event.
Data related to security threats faced by enterprises, including malicious IP addresses, domain names, URLs, and file HashIOC indexes, are continuously acquired by subscribing to threat intelligence sources. For unstructured threat information, natural language processing is utilized to extract and arrange, and named entity recognition, keyword extraction, emotion analysis and topic modeling methods are adopted to obtain structured threat information data. And periodically acquiring the latest vulnerability information from the authority vulnerability database through the web crawler. And constructing a mapping relation between the enterprise asset and the known vulnerability by utilizing a knowledge graph according to the vulnerability information, and judging the high-risk vulnerability of the software and hardware system used by the enterprise. Underground threat information is acquired by participating in a hacker forum and a secure community channel. Aiming at the underground threat information, the social network analysis is utilized to mine the association relation among different community users, and active threat participants and behind-the-scenes operators are identified. And uniformly storing threat information, vulnerability data and security events on a big data platform by adopting a data warehouse. Aiming at multi-source heterogeneous data, data fusion is utilized, and data with different sources, formats and granularity are converted into a uniform structural form through data cleaning, normalization and associated ETL flow. And carrying out association analysis on threat information and enterprise internal data on the fused multi-source data set by utilizing data mining. And adopting a frequent item set and an association rule mining algorithm to discover the internal relation among threat events, fragile assets and attack skills from the massive data. And dividing scattered data points into different clusters according to the similarity of the data attributes by adopting a clustering algorithm, and mining event clusters with similar attack modes and risk characteristics. And mapping the data obtained by the association analysis into nodes and edges by using a graph database, and constructing a multi-element complex association network map. On the basis of the map, a map embedding algorithm is adopted to automatically learn out low-dimensional vector representation of the event nodes, and high-order correlation among the event nodes is mined out. And finally, integrating the results of data fusion, association analysis, clustering and mapping together to form a more comprehensive, three-dimensional and accurate depiction of the risk event, and providing data support for subsequent risk assessment, early warning and decision making.
Specifically, through subscribing threat information sources, including open source information, business information, industry information and the like, data related to security threats faced by enterprises, such as IOC indexes of malicious IP addresses, domain names, URLs, file Hash and the like, and background information of APT organization, attack techniques and the like, are continuously acquired. Unstructured threat information is extracted and collated using natural language processing techniques. The method comprises the steps of adopting a named entity recognition method, extracting entity objects such as IP addresses, domain names and hash values from an information text by utilizing technologies such as regular expressions, dictionary matching and the like, automatically recognizing keywords threatening information by utilizing keyword extraction algorithms such as TF-IDF, textRank and the like, classifying topic attributes of the information according to the keywords, judging emotion tendency in information description by utilizing emotion analysis technology, knowing severity of attack events, risk of vulnerabilities and the like, and discovering potential topic distribution from a large number of information corpuses by utilizing topic modeling methods such as LDA, LSI and the like to realize clustering of threat information. The method comprises the steps of periodically obtaining latest vulnerability information from authoritative vulnerability databases such as NVD (network video device), CVE (computer program environment) and the like through a web crawler technology, including vulnerability description, influence version, utilization codes and the like, constructing a mapping relation between enterprise assets and known vulnerabilities by utilizing a knowledge graph technology, timely finding out possible high-risk vulnerabilities of software and hardware systems used by enterprises, and evaluating potential influences of the vulnerabilities on a service system. Through participating in channels such as a hack forum and a secure community, underground threat information such as sales information of malicious software, sample information of data leakage and latest attack methods of attack partners is obtained, association relations among users in different communities are mined by utilizing a social network analysis technology, active threat participants and behind-the-scenes operators are identified, and the coming and going of targeted attack activities are insight. Based on multi-source heterogeneous data acquisition, threat information, vulnerability data, security events and the like are uniformly stored on a big data platform such as Hive by adopting a data warehouse technology, so that integrated management and association analysis of data are facilitated. Meanwhile, data with different sources, formats and granularity are converted into a uniform structural form through ETL processes such as data cleaning, normalization and association by utilizing a data fusion technology. In the data cleaning link, preprocessing such as de-duplication, de-noising and format conversion is carried out on original data, the data quality is improved, in the data normalization link, unified field naming, data types and measurement units are adopted for similar data of different sources, semantic ambiguity is eliminated, in the data association link, scattered data are combined and connected according to key attributes such as time stamps and object IDs, and a relational network between the data is constructed. And finally, establishing a global associated view of the multi-source security data, and laying a foundation for deep mining analysis. And carrying out association analysis on threat information and enterprise internal data on the fused multi-source data set by utilizing a data mining technology. By adopting frequent item sets and association rule mining algorithms, such as Apriori, FP-Growth and the like, internal links among threat events, fragile assets and attack methods are discovered from massive data, for example, certain types of malicious software are usually discovered to be associated with specific CC servers, and then the curtain back-partner of targeted attack activities is deduced. And adopting a clustering algorithm, such as K-Means, DBSCAN and the like, dividing scattered data points into different clusters according to the similarity of data attributes, mining event clusters with similar attack modes and risk characteristics, and identifying security events with wider influence range. And analyzing statistical characteristics such as time distribution, spatial distribution, target type and the like of each event cluster, and judging the severity and potential risk of the event. And mapping the data obtained by association analysis into nodes and edges by using a graph database technology such as Neo4j, constructing a multi-complex association network map, and intuitively presenting the evolution track and propagation path of the risk event in different dimensions such as time, space, logic and the like. Based on the map, a map embedding algorithm, such as DeepWalk, graphSAGE, is adopted to automatically learn out the low-dimensional vector representation of the event node. The core idea of graph embedding is to map nodes in the graph into a low-dimensional vector space so that there are linked nodes in the graph whose vector representations are also similar. The expression learning method can automatically extract semantic features contained in the network structure and mine high-order correlation among event nodes. Based on the vectorization representation of the event nodes, algorithms such as spectral clustering, K nearest neighbor and the like can be further adopted to mine more hidden event association from the perspective of semantic similarity. And finally, integrating the results of data fusion, association analysis, clustering and mapping together to form a more comprehensive, three-dimensional and accurate depiction of the risk event, and providing data support for subsequent risk assessment, early warning and decision making. In the process of acquiring and extracting threat information, a regular expression can be utilized to rapidly extract IOC indexes from unstructured information texts. For the extracted IOC index, the emotion dictionary and dependency syntax analysis can be utilized to judge whether the context where the IOC index is located represents maliciousness. If description of 'domain.com implanted malicious code' appears in the information, the domain name can be judged to be a malicious domain name by obtaining the relation between 'implanted' and 'malicious' through dependency analysis. When classifying information, word embedding models such as Word2Vec can be adopted, word vectors are trained in a Skip-gram mode, and the similarity degree of information keywords and different categories is calculated through cosine similarity, wherein the similarity degree is greater than 0.6 and is classified into corresponding categories. in crawling of structural vulnerability libraries such as NVD, an XPath analyzer can be used for positioning to "/NVD:feed/NVD:entry" nodes, extracting label information such as description and harm below the nodes, analyzing key fields such as CVE number, CVSS score and influence version of the vulnerability by using BeautifulSoup library, and finally calculating text distance between enterprise asset version information and vulnerability influence version by adopting WordMover' SDISTANCE algorithm, wherein the distance is smaller than 1.5 and is considered to be matched, so that an asset vulnerability mapping map is generated. In the social network analysis process, a PageRank iterative algorithm can be adopted, a relation directed graph is constructed according to user interaction behaviors, a damping coefficient is set to be 0.85, initial node weights are all 1, user importance scores after iterative calculation and convergence are screened out, and users with scores greater than 10 are selected as key opinion leadership. Performing topic clustering on post contents issued by key users in different communities, adopting a nonnegative matrix factorization NMF algorithm, setting topic number k=5, extracting TFIDF characteristics of the posts, obtaining a topic-word matrix and a user-topic matrix by minimizing reconstruction errors, taking topic words with weights larger than 0.3 as main topics discussed by the users, and accordingly revealing interaction characteristics and topic trends among different communities and different users. In the data fusion and association analysis, the multi-source heterogeneous data is stored by adopting a graph database, and the data cleaning, normalization and graph construction are realized by using a Cypher query language. The entity object and the relation edge are inserted into the gallery through the Create and Merge sentences, and the multi-hop association among the entities is conveniently mined by adopting the natural representation form of the node-edge-attribute. In association rule mining, an FP-Growth algorithm can be adopted, the minimum support degree is set to be 0.05, the minimum confidence degree is set to be 0.8, and a frequent item set and association rules are generated. The method comprises the steps of converting entities such as attack clusters, attack events, target assets and the like into elements in a collection, finding out frequent co-occurrence modes among the entities through iterative tree building and recursion mining, and revealing the intrinsic mechanism of malicious activities. In cluster analysis of multi-source data, a DBSCAN density-based clustering algorithm can be adopted, and the clusters of points connected in density can be identified by calculating Euclidean distances among data points. The radius parameter eps=0.5 and the density threshold min_samples=5 are set, i.e. when there are at least 5 points in a neighborhood with radius of 0.5 around each data point, they are connected into a cluster in density. For noise points, local outlier factors of the noise points can be calculated through an outlier detection algorithm LOF, and the local outlier factors are regarded as outliers and are treated independently, wherein the factors are larger than 1.5. In the clustering process, attack time, source IP, destination port and the like are selected as feature dimensions, a plurality of attributes of attack events are comprehensively clustered, the similarity of the events can be comprehensively described, and event clusters with the same attack mode can be mined. For each event cluster, time sequence analysis and statistical hypothesis testing can be adopted to judge whether the cluster shows sudden explosive growth in time and the difference significance of event distribution and random distribution, so as to infer the severity and influence range of the event. In the correlation analysis based on graph embedding, a DeepWalk algorithm can be used for generating a node sequence through random walk, training is carried out by adopting a Skip-gram model, and the log likelihood of node co-occurrence is optimized to obtain a low-dimensional vector representation of the node. Setting the number of steps of random walk as 10, the window size as 5, the negative sampling number as 5, and iteratively training 200 epochs, and finally mapping the nodes to a 128-dimensional vector space. In the space, euclidean distance between event node vectors can be calculated, the distance is smaller than 0.5 and can be regarded as the nodes with similar semantics, a spectral clustering algorithm is further adopted, a similarity matrix between the nodes is calculated, K-Means clustering is carried out on the similarity matrix, and finally, an event association mode hidden behind a graph structure is found.
S06, constructing a risk propagation map according to the risk source, and carrying out visual display and dynamic tracking on each risk event according to the risk propagation map.
As a preferred embodiment of the first embodiment, the constructing a risk propagation map according to the risk source, and visually displaying and dynamically tracking each risk event according to the risk propagation map specifically includes:
And (3) adopting risk tracing, tracing back a path and an influence range of risk propagation according to a causal chain of a risk event, identifying key nodes and propagation paths of the risk through dependency relationship and data flow direction among assets, and constructing a risk propagation map to realize visual display and dynamic tracking of the risk.
Network traffic data, system operation logs and database access records multi-source heterogeneous data are collected in real time by deploying an IDS intrusion detection system, a log audit system and a database audit system safety monitoring tool. And storing, cleaning, correlating and analyzing the massive heterogeneous data by utilizing a Hadoop and Spark big data processing platform, and identifying suspicious security events and abnormal behavior patterns in the network. And aiming at the identified security risk event, adopting causal reasoning by a root cause analysis method, backtracking from an attack result to an attack origin, analyzing the pre-cause results of the occurrence of the event layer by layer, and judging the time sequence association and causal dependency relationship between the events. And (3) comprehensively combing the information assets and the network architecture of the enterprise by collecting asset configuration information, a network topology structure and protection strategy rules, and constructing an asset library and a knowledge library which cover the whole scene. Each IT, OT and IoT asset in the network is discovered by asset mapping, the physical logic position, the business application, the communication protocol and the vulnerability information of the asset are identified, the asset topological graph and the dependency graph of the enterprise are drawn, and the asset connection conditions of the cross-network segments and the cross-region are visually presented. And (3) focusing on full life cycle management of the data asset, deploying a data leakage protection system, and collecting behavior events of creation, storage, circulation and use links of the sensitive data asset. Metadata information of the data asset is extracted through data blood-edge analysis, and a complete traceability map of the data asset from generation to use is constructed. On the basis of the risk propagation map, integrating real-time service monitoring data, applying stream calculation and complex event processing, associating service abnormality and security risk event in real time, dynamically deducing a potential propagation path of risk in an asset network, and tracking and early warning the evolution process of the risk.
Specifically, through deployment of security monitoring tools such as an IDS intrusion detection system, a log audit system, a database audit system and the like, multi-source heterogeneous data such as network flow data, system operation logs, database access records and the like are collected in real time, massive data are stored, cleaned, correlated and analyzed by utilizing a big data processing platform such as Hadoop, spark and the like, suspicious security events are identified, abnormal behavior patterns in a network are found, and a data basis is provided for risk tracing analysis. And aiming at the identified security risk event, adopting a causal reasoning technology, such as root cause analysis method, backtracking from an attack result to an attack origin, and analyzing the pre-cause results of the event occurrence layer by layer. By using the time sequence relation and correlation between the security events, judging whether one event is the cause of the other event or not through methods such as Granger cause and effect test of time sequence analysis. In addition, the request-response log can be analyzed through a process mining technology, the calling dependency relationship among the system components is revealed, and finally, a complete causal topology and an event chain are formed. And comprehensively combing the information assets and network architecture of enterprises, collecting asset configuration information, network topology structures, protection policy rules and the like, constructing an asset library and a knowledge base which cover the whole scene, and forming an asset whole life cycle management and configuration change management mechanism. By adopting an asset mapping technology, each IT, OT, ioT asset in the network is found, the physical logic position, the affiliated business application, the communication protocol, the vulnerability information and the like of the asset are identified, the asset topological graph and the dependency graph of an enterprise are drawn, the asset connection condition of the cross-network segment and the cross-region is visually presented, and the security boundary and the attack surface are known. Based on asset mapping results, vulnerability information, authority information and the like of all asset nodes are mapped into an attack graph model by utilizing an attack graph construction technology. The network accessibility of each host is abstracted to be a directed edge of an attack graph, vulnerable ports and services on the hosts are abstracted to be nodes, the CVSS scores of the CVE vulnerability library are utilized to weight the dangerous degree of the nodes, and finally a logic attack graph with definite node-edge semantics is formed. On the basis, a causal graph model such as formalized predicate logic or Bayesian network is used for defining reasoning rules between attack preconditions and attack result states, and all potential attack paths from vulnerable points to key assets are automatically analyzed through forward and backward reasoning of causal links. The full life cycle management of focused data assets, the deployment of a Data Leakage Protection (DLP) system, and the collection of behavior events of creation, storage, circulation, use and other links of sensitive data assets. Metadata information of the data asset is extracted through a data blood-edge analysis technology, the metadata information comprises a data source, a conversion process, a destination and the like, a traceability map of the data asset is constructed, and a complete circulation link from generation to final use of sensitive data is restored. The method comprises the steps of carrying out static analysis on database SQL sentences, ETL task scripts, program codes and the like, extracting dependency relations between tables and fields, obtaining data access and transmission behaviors through scanning weblogs and data packets, reconstructing operation time sequences of data transfer, automatically abstracting and simplifying massive blood-cause relations through a machine learning clustering algorithm, and finally revealing key paths of the data transfer. In addition, a data flow graph modeling method is utilized to formally define data exchange interfaces among different service systems, sensitive data access and sharing conditions of cross systems and cross networks are visually displayed, and abnormal flow and illegal operation behaviors are timely found. In the process of constructing the map, a rule-based reasoning engine such as Datalog or a cost-based shortest path algorithm is utilized to deeply mine implicit relations among events, assets, threats and vulnerabilities, a map embedding technology is utilized to sample a large number of node sequences on the map in a random walk mode by using models such as DeepWalk, node Vec and the like, word embedding model learning is utilized to obtain low-dimensional vector representation of each node, and further semantic similarity of vectors is utilized to describe deep relations among entities. Finally, the atlas not only can reveal the causal propagation process of each risk event in time, space and logic to form a local risk propagation subgraph, intuitively answer how the risk goes to key assets step by step, but also can globally describe the risk topology of the whole enterprise, answer which weak points and which paths have the greatest risk propagation influence. On the basis of a risk propagation map, integrating real-time service monitoring data, applying technologies such as streaming calculation, complex event processing and the like, dynamically associating occurrence of service abnormality and security risk event, deducing potential propagation paths of risks in an asset network in real time, pre-judging the influence range of the next hop, and carrying out real-time tracking and situation awareness on the propagation process of the risks. Meanwhile, deep learning models such as a graph neural network are introduced, and implicit state representations of each node are learned by applying models such as GCN on a risk propagation graph. The representation incorporates on the one hand the inherent properties and fragile features of the asset itself, and on the other hand recursively aggregates the risk states of the neighboring nodes by means of the message passing mechanism of the graph, forming a contextual representation of the asset risk. In the process of training the graph neural network, a attention mechanism can be integrated, and the weight coefficient of neighbor risk aggregation can be adaptively adjusted according to the importance degree of different asset nodes and the correlation of the Attack/dependent equal edges among the nodes. After the model is trained, the subsequent propagation trend of risks can be predicted on line according to real-time states of the assets and the network, and the high-risk propagation path can be early warned in advance. And finally, in the quantitative evaluation model, scoring risk nodes of the propagation path from multiple dimensions such as asset value, data importance, attack cost and the like, responding to high-risk propagation events in time, and providing multiple treatment decision bases such as node blocking, strategy optimization, deep defense, attack tracing and the like. In the safety data acquisition and anomaly detection, an IDS system with a Suricata open source can be deployed, port scanning behaviors in a network are detected in real time, traffic analysis tools such as Zeek and the like are adopted, communication data such as TCP, UDP, ICMP among all hosts are extracted, a MapReduce and the like distributed computing framework is used for counting traffic distribution characteristics of each IP in a period of time, a single classifier algorithm such as One-ClassSVM and the like is trained for constructing a normal behavior baseline, a neural network model such as RNN and the like is used for learning a time sequence mode of an IP communication sequence, traffic distribution at the next moment is predicted, and when JS divergence of actual distribution and predicted distribution is greater than 1.2, abnormal events are judged. In the causal analysis, a PC stability algorithm can be used, causal dependency relationship among event variables is found through condition independence test, and causal graph models are scored according to AIC, BIC and other criteria, so that an optimal causal structure is constructed. For multiple independent causal chains, they can be linked into a causal graph through a MLN markov logic network, where the first order logic in the MLN can represent an attack rule, such as "vulExists (H, V) a vulLinkHost (V, H) →exec code (H)" indicates that the host H with vulnerability V can be trapped. In asset mapping, a Nmap port scanning tool can be used, a SYN semi-connection scanning technology is adopted, all IP addresses in an enterprise network are scanned in full ports at 2000 concurrency degrees, service types, versions and equipment types of the opened ports are automatically identified by a machine learning algorithm such as a decision tree, a random forest and the like through response message fingerprints of the identified ports, and the average accuracy rate can be more than 90%. For key assets, the configuration parameters of the key assets can be obtained remotely by using protocols such as WMI, SSH and the like, patch information, account rights and the like are extracted, and CMDB configuration item attributes of the key assets are evaluated. In the attack graph construction, a depth-first search algorithm may be employed to recursively trace back the attack path from the leaf nodes of the attack graph until the root node of the attack graph (i.e., the initial attack surface) is found. For the cyclic dependency problem, a strong connected component Tarjan algorithm of the directed graph can be adopted to find all cyclic dependency clusters, and each cluster is abstracted into a super node. For each node, its risk value may be evaluated from 8 dimensions of attack complexity, attack vector, etc., using CVSS scoring criteria, multiplied by an asset importance score, and obtaining the final risk score of the node, wherein the weighted average value of the scores of all non-zero risk nodes is the risk score of the attack path. In the data tracing analysis, ETL tools such as DATASTAGE can be used for periodically scanning enterprise data warehouse, extracting the blood-edge relation of tables and fields to form blood-edge triples in the format of < source table, target table and operation type >, such as < table_A, table_B and insert > which indicate that data flows from table A to table B. And then, using a spectral clustering algorithm, and gathering the highly-relevant blood-edge relations into one type through the eigenvector decomposition of a table-table adjacency matrix, wherein each type is a key path for data stream transfer. For complex blood-lineage networks, a community discovery algorithm, such as a Louvain algorithm, can be used, the modularity of different communities is calculated firstly, the value of the modularity is between-1 and 1, communities with the value greater than 0.3 are regarded as closely related blood-lineage clusters, the centrality of each community is calculated, and the node with the highest centrality is the key data stream. In risk profile inference, a map database of gStore or other open sources may be used, and the SPARQL language is used to describe risk propagation rules. The rules are organized into a rule tree, and all implicit risk propagation relationships are deduced by recursively triggering the reasoning process on the rule tree from known facts through a backward link reasoning algorithm. For the representation learning of the atlas, a MetaPath Vec heterogeneous graph embedding model can be used for defining a meta-path of risk propagation, such as threat-vulnerability-fragile asset-sensitive data, random walk guided by the meta-path is used for generating context relations among different risk elements, skip-Gram word embedding models are used for vectorization, and finally 80-dimensional real vectors are used for representing each risk node. In the neural network of the risk tracing graph, the CVSS risk value, the asset importance and other internal attributes of each node in the graph can be encoded into the characteristic vector of the node, and the node characteristics are aggregated in a 4-hop neighborhood range through graph convolution operation, wherein an attribute mechanism is adopted in the aggregation mode. For different types of edges, such as attack, dependence, etc., different attention weights can be learned, and the information of neighbors can be aggregated respectively. During model training, taking asset nodes related to historical attack events as risk positive samples, randomly sampling equal amounts of non-attacked assets as negative samples, and optimizing model parameters through a binary cross entropy loss function to ensure that the prediction probability of the risk nodes is greater than 0.5 and the prediction probability of non-risk nodes is less than 0.5. after the model converges, the method can be used for the influence range prediction and hazard assessment of unknown risk events.
In this preferred embodiment, the present application uses this information to trace back the propagation path and scope of influence of each risk event after having obtained a risk event analysis report and identified a causal chain for each risk event. By analyzing the source of the risk, it can be inferred how the risk is spread across the network or system, and which assets are affected. Then, a risk propagation map is constructed from these propagation paths and the data of the influence range. This map visually reveals the propagation process of the risk so that the security team can intuitively see how the risk propagates from one point to another and the dependencies between them. In addition, by dynamic tracking, changes and development of risk can be monitored in real time, thereby more effectively responding to and managing risk events. Thereby helping the enterprise understand the nature of the risks and also providing a way to monitor and mitigate the effects of these risks.
The asset risk tracking and tracing method comprises the steps of obtaining network asset basic data comprising an enterprise network topological graph and an asset vulnerability risk list, drawing a system asset topological graph to identify safety risk points of key system assets, and further forming the system asset risk list. And then drawing a data asset flow chart according to the system asset topological graph, identifying the security risk of the sensitive data asset, and generating a data asset risk list. And tracking the occurrence process and the influence range of the risk event by utilizing a big data analysis technology and combining the asset risk list and the data asset risk list, and compiling a risk event analysis report. Finally, constructing a causal chain of the risk event according to the information in the risk event analysis report, tracing to a risk source, and constructing a risk propagation map, thereby providing dynamic change network safety protection for enterprises and realizing effective identification and management of enterprise asset risks.
Example two
Referring to fig. 4, a tracing and tracing device for asset risk is provided in an embodiment of the present application.
In this embodiment, the asset risk tracking and tracing device includes an acquisition module 10, an identification module 20 and a construction module 30;
The acquiring module 10 is configured to acquire network asset basic data, where the network asset basic data includes an enterprise network topology graph and an enterprise asset vulnerability risk list.
More specifically, the drawing process and the application of the enterprise network topology graph are as follows:
And acquiring IP address, port number and service type information of the enterprise network equipment by adopting network scanning to obtain a comprehensive network asset scanning result. And drawing an enterprise network topological graph according to the scanning result, determining the distribution condition and the connection relation of the network assets, and providing basic data for subsequent asset mapping and risk identification. And classifying and sorting the acquired network equipment information in an automatic mode, and dividing according to the type of an operating system, the type of a manufacturer and the dimension of a network position to form a hierarchical classification system and a network asset inventory. Based on the network topology diagram, the network equipment is divided into layers according to the importance of the assets and the service relevance, and the boundaries of a core layer, a convergence layer, an access layer and the connection dependency relationship of the boundaries are defined. And identifying system loopholes and configuration defect security risks through deep analysis of the network assets, calculating quantified risk levels, and forming a network asset risk map. Asset mapping is adopted, asset attributes and association relations are automatically and synchronously updated through integration with an IT operation and maintenance management platform and a CMDB system, network asset attribute information is associated with a business system, and business dependence relations and data transmission paths are combed. Based on the asset mapping, network isolation and access control means are adopted to divide the security domain of the network asset according to the service importance, the data sensitivity and the compliance requirements, and isolation limitation among different security domains is realized. Through continuous network scanning and asset mapping, network asset information is dynamically updated, newly-added or changed equipment is discovered timely, safety risks of the newly-added or changed equipment are evaluated, corresponding reinforcement optimization is carried out, a dynamic visualized network asset management and safety operation system is constructed, and the overall network safety protection level is improved.
By means of deep analysis of network assets, security risks such as system loopholes and configuration defects are identified, the problems that network equipment lacks password complexity requirements or does not change default passwords are focused, quantitative risk levels are calculated according to a CVSS loophole scoring system from factors such as availability, influence range and hazard degree of risks, a risk map of the network assets is formed, and decision support is provided for subsequent security reinforcement and protection strategy formulation. The asset mapping technology is adopted, equipment configuration information is collected based on SNMP protocol, WMI interface and other modes, asset attributes and association relations are automatically and synchronously updated through integration with IT operation and maintenance management platform, CMDB and other systems, attribute information of network assets is associated with a business system, business application and data flow directions of all the assets are determined, dependence relations of the businesses and data transmission paths are combed, a comprehensive asset mapping view is formed, and business impact analysis and risk assessment are facilitated. in the process of scanning network assets, by means of Nmap and other tools, full port scanning can be performed on a designated network segment at the rate of 1000 data packets sent per second in the modes of TCPSYN scanning, TCP connection scanning, UDP scanning and the like, so that information such as IP addresses, open ports, service versions and the like of network equipment can be obtained. The scan result can be exported into XML, JSON and other formats, analyzed by a Python script, key fields are extracted, and classified and summarized according to the type of an operating system (such as Windows, linux, ciscoIOS and the like), the model of a manufacturer (such as Cisco, juniper, HP and the like) and the network position (such as an office area, a production area, a DMZ area and the like) to form a network asset list. Meanwhile, by utilizing visualization tools such as Gephi, cytoscape and the like, a network topology graph is drawn according to the connection relation among the assets, and the distribution and interconnection conditions of the network equipment are intuitively displayed. In the aspect of vulnerability risk identification, a vulnerability scanning tool such as Nessus, openVAS can be used for carrying out security inspection on network equipment to identify common vulnerabilities and configuration defects, such as default passwords, weak passwords, MS17-010, heartbleed and the like. The vulnerability scanning can update the detection rules regularly according to the CVE vulnerability database, the scanning result can refer to the CVSS scoring standard, comprehensive evaluation is carried out on the vulnerability scanning result from dimensions such as Attack Vector (AV), attack Complexity (AC), user Interaction (UI), authority requirement (PR), influence range (S) and the like, vulnerability risk levels of 0-10 points are obtained, and repair suggestions are provided. For example, for high risk vulnerabilities scored above 7 points, repair needs to be completed within 5 working days, and for medium risk vulnerabilities scored between 4-6 points, repair needs to be completed within 10 working days. In the asset mapping process, SNMPv2/v3 protocol can be used for periodically collecting performance index data such as system information, interface information, CPU and memory utilization rate of network equipment, and data collection and analysis are realized through a pysnmp library of Python. For Windows servers, the server configuration information may be obtained remotely through WMI interface using wmic commands. The collected data can be synchronously compared with an IT asset management system, a Configuration Management Database (CMDB) and the like, the utilization trend of resources such as CPU, memory and the like in the future week is predicted through a time sequence algorithm such as ARIMA, prophet and the like, abnormal data are found in time, and a resource utilization report is generated, so that capacity planning and optimization are facilitated. In the aspect of security domain division, the network asset security domain can be divided into a core domain, a production domain, a development test domain, an office domain, an internet domain and the like according to the service attribute and the data sensitivity of the network asset, security devices such as an Access Control List (ACL), an Intrusion Prevention System (IPS), a Web Application Firewall (WAF) and the like are deployed between the boundaries and the regions of each security domain through the thought of deep defense, and corresponding security policies are configured. If between the Internet and the intranet, the external access can be limited by deploying an Nginx reverse proxy server and adopting an IP white list, and only trusted users can be allowed to access internal applications through IP, and between the office network and the production network, the user identity authentication can be realized by adopting a VPN gateway and adopting double-factor authentication (2 FA), and the network isolation can be realized by VLAN division. meanwhile, a threat information platform can be used for collecting external IP credit data, and risk scoring is carried out on the access source IP, so that dynamic access control is realized, and the risk of network attack is reduced.
More specifically, the drawing procedure and the application of the enterprise asset vulnerability risk list are as follows:
Aiming at target equipment and a system discovered by network asset scanning, using Nessus and Nexpose vulnerability scanning tools to detect security vulnerabilities, identifying security vulnerabilities existing in the assets through vulnerability feature library matching and vulnerability verification means, quantitatively evaluating the harmfulness of the vulnerabilities according to a CVSS (visual vulnerability scoring) system to form an enterprise asset vulnerability risk list, and determining the risk level of the vulnerabilities.
As shown in fig. 2, by setting a scanning policy, designating a scanning range, a scanning mode and a scanning depth, vulnerability scanning configuration parameters of a target asset are obtained. And identifying system vulnerabilities, web vulnerabilities and weak password security vulnerabilities existing in the target asset from multiple dimensions of an operating system, application software, a database and network equipment by adopting a preset number of kinds of vulnerability identification, including vulnerability detection based on feature code matching, protocol analysis based on a state machine and patch comparison analysis based on reverse engineering. And acquiring the name, the number, the type and the harm attribute information of the security vulnerability according to the identification result of the security vulnerability. And comprehensively evaluating the identified security vulnerabilities according to a vulnerability scoring system CVSS common to the industry from attack vectors, attack complexity, authority requirements, user interactions and influence range dimensions to obtain CVSS scores of the vulnerabilities. And judging the risk level of the security vulnerability according to the CVSS score. And integrating the vulnerability scanning result and the risk assessment information to generate a vulnerability risk list of the enterprise asset. The vulnerability risk list comprises the number of vulnerabilities, the types of the vulnerabilities and the risk distribution conditions of each asset, and the vulnerabilities which need to be focused and repaired are determined by sequencing according to the risk grades.
Specifically, aiming at target equipment and a system discovered by network asset scanning, a Nessus, nexpose vulnerability scanning tool is adopted to comprehensively detect security vulnerabilities. By setting a scanning strategy, a scanning range, a scanning mode and a scanning depth are specified, so that vulnerability scanning of the target asset is realized. Common vulnerability scanning strategies include full-port scanning, system vulnerability detection, web application vulnerability detection and the like, and proper scanning modes (such as SYN half connection and TCP full connection) and detection intensities (such as lightweight, common and deep) are selected according to network environments and system types. In the vulnerability scanning process, the scanning tool interacts with the target system and matches information according to the built-in vulnerability feature library, and analyzes response information by sending a detection data packet to judge whether a known vulnerability exists. The scanning tool adopts various vulnerability identification technologies, such as vulnerability detection based on feature code matching, protocol analysis based on a state machine, patch comparison analysis based on reverse engineering and the like, identifies security vulnerabilities such as system vulnerabilities, web vulnerabilities and weak passwords existing in the asset from multiple dimensions such as an operating system, application software, a database, network equipment and the like, and acquires attribute information such as names, numbers, types and hazards of the vulnerabilities to form a detailed vulnerability scanning report. For each scanned security vulnerability, comprehensively evaluating the dimensions of attack vectors, attack complexity, authority requirements, user interaction, influence range and the like by referring to a vulnerability scoring system CVSS common to the industry. Taking CVSSv as an example, the scoring formula is that the basic score=0.6×influence index+0.4×availability index-1.5×influence index×availability index, the CVSS score of the vulnerability is obtained through the calculation of the formula, and the harmfulness of the vulnerability is quantified. According to CVSS grading, security vulnerabilities can be classified into different risk levels, for example, grading more than 9 points is a serious risk, 7-8 points are high risk, 4-6 points are medium risk, 0-3 points are low risk, and the risk levels of the vulnerabilities are marked. And (3) integrating the vulnerability scanning result and the risk assessment information, generating a vulnerability risk list of the enterprise assets, determining the number of vulnerabilities, the types of vulnerabilities and the risk distribution conditions of each asset, sequencing according to the risk level, and determining vulnerabilities needing to be focused and repaired. On the basis, the priority order of bug repair is determined according to factors such as the risk level, the business influence degree, the repair difficulty and the like of the bug, and a staged bug repair plan is formulated. Finally, according to the bug repair priority, a targeted bug repair and protection scheme is formulated, such as deploying security patches, upgrading software versions, optimizing security configurations, deploying security protection equipment and the like, continuously tracking bug repair progress, periodically retesting, ensuring that security bugs are effectively repaired, and reducing security risks facing enterprises. When vulnerability scanning is performed, a Nmap tool can be used for performing TCPSYN scanning on a target IP address range, SYN data packets are sent to a target port, the port opening state is judged according to response conditions, the scanning rate can be set to 1000 data packets per second, and network topology and port information of a target asset can be rapidly identified. And then, adopting a Nessus and other specialized vulnerability scanning tools, aiming at services corresponding to an open port, such as Web services, database services, mail services and the like, utilizing a built-in vulnerability plugin library, analyzing response data by sending a specific detection request, and judging whether security vulnerabilities such as SQL injection, XSS cross-site script, remote command execution and the like exist. For example, for Web application, a matching algorithm based on regular expression can be used to detect whether the response contains specific error information, such as database error information, web container error information and the like, identify potential injection points, and for an operating system, version judgment can be carried out through Banner information, fingerprint characteristics and the like, POC verification scripts in a vulnerability knowledge base are utilized to simulate attack requests and confirm the existence of vulnerabilities. The scanning process can be divided into two stages of preliminary detection and depth detection, wherein the preliminary detection mainly identifies common high-risk vulnerabilities, the scanning time is controlled within 1 hour, the depth detection is further used for verifying the preliminary detected suspected vulnerabilities by utilizing the vulnerability exploitation codes, and tools such as Sqlmap, metasploit and the like are used for attempting to acquire system authorities to determine the availability of the vulnerabilities. The scanning result can be exported into a standardized XML format report, unified display and management are carried out through a self-defined developed vulnerability management platform, the platform is developed by using PythonDjango frames, the front end is realized by using Vue. Js, the vulnerability data are stored by using an elastic search, a CVSS scoring interface is called, risk scoring is carried out on each vulnerability, a vulnerability risk matrix diagram is formed, and the vulnerability distribution situation of each service system is intuitively presented. the vulnerability management platform can also count the number proportion of various vulnerabilities, such as the high-risk vulnerability proportion of 30%, the medium-risk vulnerability proportion of 50% and the low-risk vulnerability proportion of 20%, perform risk ranking on the assets in combination with an asset importance assessment algorithm, calculate the risk value of each asset, wherein the risk value=the asset importance×the vulnerability risk score×the number of vulnerabilities, and formulate a vulnerability repair plan according to the order of the risk values from high to low, and push the vulnerability repair plan to relevant operation and maintenance personnel for disposal. In the bug repairing process, secondary scanning verification should be performed to ensure that the patch is validated and the bug is eliminated.
The application ensures the comprehensive identification and accurate mapping of the enterprise network assets by acquiring the IP address, port number and service type information of the network equipment, and provides detailed basic data for subsequent risk assessment. Further, the enterprise network topology map drawn using this information provides an intuitive view for visualizing network structures and identifying potential security vulnerabilities, making security analysis more efficient and accurate. In addition, by combining applications of vulnerability scanning tools such as Nessus, nexpose and the like, the security vulnerability of the assets is detected, and the discovered vulnerabilities are quantitatively evaluated according to a CVSS vulnerability scoring system, so that scientificity and systemicity of vulnerability management are enhanced, and enterprises are ensured to be capable of preferentially processing vulnerabilities with the greatest threat to network security.
The identification module 20 is configured to draw a system asset topological graph according to the network asset basic data, identify a security risk point existing in a key system asset according to the system asset topological graph, and obtain a system asset risk list;
drawing a data asset flow chart according to the system asset topological graph, and identifying security risks existing in sensitive data assets according to the data asset flow chart to obtain a data asset risk list;
and identifying the occurrence process and the influence range of each risk event on the system asset risk list and the data asset risk list according to a preset big data analysis method to obtain a risk event analysis report.
As a preferred embodiment of implementation two, the drawing a system asset topological graph according to the network asset basic data, and identifying security risk points existing in the key system asset according to the system asset topological graph to obtain a system asset risk list, which specifically includes:
Aiming at the network asset basic data, a system asset topological graph is drawn by using ENTERPRISEARCHITECT architecture design tools through application dependency analysis and architecture analysis, the dependency relationship and the data flow direction between systems are determined, and safety risk points existing in key system assets are identified according to business importance assessment and risk assessment methods to form a system asset risk list.
And extracting the dependency relationship among services by analyzing the API call, the configuration file and the database connection information in the application code by adopting ApplicationInsight, dynatrace tools, and constructing a directed acyclic graph containing service nodes and dependency edges. According to the application dependency, using ENTERPRISEARCHITECT architecture design tools, a system asset topology is drawn from multiple dimensions of service view, application view, data view and technology view, and deployment relationship, interface relationship and data flow between system components are presented. On the basis of identifying key assets in the running process of the system, adopting a STRIE threat modeling method to identify security threats existing in the system architecture from six dimensions, and calculating the risk value of the assets. Aiming at the risk assessment result, adopting a micro-service architecture and containerization to realize logic isolation among different services, implementing desensitization and encryption protection on sensitive data, adopting high-availability architecture design and remote multi-activity deployment, and improving disaster tolerance capability of the system. In the implementation of the framework safety reinforcement process, a normal behavior baseline is established by using a machine learning algorithm through a log analysis and flow analysis means, an abnormal deviation mode is identified, and the running state of the system is monitored in real time.
Specifically, aiming at network asset basic data, by applying a dependency analysis technology, adopting ApplicationInsight, dynatrace and other tools, comprehensive dependency relation carding is carried out on an application system of an enterprise, by analyzing information such as API call, configuration file, database connection and the like in application codes, dependency relation among services is extracted, meanwhile, a Directed Acyclic Graph (DAG) comprising service nodes and dependency edges is constructed by combining with actual call chain data collected by APM and other monitoring tools, so that an application dependency topology matrix is formed, and a foundation is laid for subsequent architecture analysis. Based on application dependency analysis, a system architecture is subjected to visual modeling by utilizing a ENTERPRISEARCHITECT architecture design tool, and a plurality of dimensions such as a service view, an application view, a data view, a technical view and the like are utilized to draw a system asset topological graph, so that a deployment relationship, an interface relationship and a data flow direction among system components are clearly presented, the system architecture is comprehensively understood, and potential architecture defects and risk points are found. According to the system asset topological graph, key assets in the running process of the system are identified, the key assets comprise core business applications, key databases, important middleware and the like, business importance is evaluated, and importance scores of the key assets are calculated from the aspects of business income, customer influence, regulatory compliance and the like by adopting a qualitative and quantitative combination mode to form an importance matrix of the key assets. Based on the identification of the key system assets, a risk assessment method is adopted to comprehensively analyze the security risks faced by the system assets. Using STRIE threat modeling method, the security threat existing in the system architecture is identified from Spoofing, tampering, repudiation, informationDisclosure, denialofService and ElevationofPrivilege dimensions, and the risk value of the asset is calculated from the aspects of vulnerability of the asset itself, the possibility of threat utilization, the influence degree of the security event, etc. The calculation formula of the risk value is that the risk value = occurrence probability x influence degree, wherein the occurrence probability and influence degree can give a quantization score of 1-5 points according to the characteristics of the risk factors, and a 5×5 risk matrix is formed. The risk value in the matrix is greater than or equal to 15 points and is high, 8-14 points are medium risk, and less than 8 points are low risk, so that a risk treatment scheme needs to be formulated in a targeted mode. And aiming at the result of risk assessment, combining the business requirement and the safety requirement, and formulating a targeted safety reinforcement and protection scheme. The method adopts a micro-service architecture and a containerization technology to realize logic isolation among different services, and uses a service grid (SERVICEMESH) technology to realize authentication and encryption communication among services through a Sidecar proxy. The method is characterized in that desensitization and encryption protection are carried out on sensitive data, various technical means such as hash, mask and encryption are adopted, and the method is carried out in various links such as data acquisition, transmission, storage, processing and application. By adopting high-availability architecture design and off-site multi-activity deployment, the disaster recovery capability of the system is improved by combining the mechanisms of load balancing, data synchronization, fault switching and the like through deployment modes of multi-site multi-activity, off-site multi-center and the like. Meanwhile, active defense is implemented on key business and data, and potential threats are discovered and blocked in time. In the process of implementing architecture security reinforcement, security monitoring and risk assessment are continuously carried out, and the running state of the system is monitored in real time through technical means such as log analysis and flow analysis. The method comprises the steps of establishing a normal behavior baseline according to historical data by using a machine learning algorithm such as IsolationForest, oneClassSVM, LOF in unsupervised learning and identifying an abnormal deviation mode, training a classification model by using a supervised learning algorithm such as SVM, randomForest and XGBoost and the like through labeling known abnormal data, carrying out abnormal judgment on new data, and carrying out trend prediction and abnormal detection on time sequence data of a system index by using a time sequence analysis algorithm such as ARIMA, prophet and LSTM. In the application dependency analysis process, a APPDYNAMICS platform can be used, and by injecting probes into application program codes, performance data and topology dependency relations of key business transactions such as method call, database query, message queue and the like are collected in real time. If HTTP request and response in the application are captured, information such as URL, parameter and response status code is extracted, call relation among services is judged, execution condition of SQL sentences is analyzed, indexes such as database read-write times and time consumption are counted, and potential performance bottleneck is found. Meanwhile, a statistical algorithm such as correlation analysis, frequent item set mining and the like is utilized to identify association rules among services from massive transaction logs, and a service dependency graph is formed if the probability of calling service B by service A is 80%, the influence of response time delay of service C on service D is the largest and the like. In architecture security risk assessment, FAIR (FactorAnalysisofInformationRisk) framework may be employed to quantify risk from both Threat Event Frequency (TEF) and Loss Expectation (LEF) dimensions. Firstly, estimating the occurrence probability of a specific type of threat according to historical security event data and threat information to obtain the frequency of the annual threat event. The direct economic and indirect losses that may be incurred upon the occurrence of a threat event are then evaluated, and a single event loss expectation is calculated. Finally, substituting the TEF and LEF values into a risk calculation formula, wherein risk=TEF×LEF, and obtaining an annual risk value. If the occurrence frequency of the annual events is 2 times, and the economic loss caused by a single event is 50 ten thousand yuan, the annual risk value is 100 ten thousand yuan, and the data leakage event belongs to a high risk level, and the data desensitization, the access control and other measures need to be preferentially adopted for prevention and control. In implementing micro-service architecture security enforcement, a service grid platform such as Istio may be used to provide fine-grained flow control and security protection capabilities by deploying Sidecar proxy containers in the Kubernetes cluster, taking over inter-service network traffic. If bidirectional TLS authentication is started, communication between services is encrypted to prevent data from being intercepted, access control rules based on roles are set on service consumers to strictly limit access of unauthorized services, and availability and stability of the services are improved through fusing, current limiting, degradation and other mechanisms. Meanwhile, by utilizing the observability characteristic of the service grid, service call indexes such as QPS, delay, error rate and the like are monitored in real time, and visual display and alarm are realized by using Prometheus, grafana tools and the like. In the aspect of data security protection, format Preserving Encryption (FPE) algorithm can be adopted to perform equal-length replacement on sensitive data, so that the desensitized data still meets the format requirement of business application. If the mobile phone number is desensitized, the regular expression "\d3\d4\d" can be used for matching the mobile phone number format, then an AES algorithm is used for generating a random number with a corresponding length for replacement, and the replaced mobile phone number is still an 11-bit number, but is not a real number, so that privacy leakage can be effectively prevented. When the abnormal behavior detection algorithm is selected, a real-time data processing frame based on SparkStreaming can be adopted to analyze mass data such as system logs, network traffic and the like in real time. And a DATAFRAME, DATASET-like distributed data structure is used in the Streaming operation, so that data conversion and statistical analysis are convenient. An isolated forest is constructed by using IsolationForest algorithm, a plurality of isolated trees are constructed by recursively randomly dividing data on attributes, and then the average path length of sample points in each tree is calculated, wherein the shorter the path length is, the higher the anomaly score is. When the anomaly score exceeds a set threshold, such as 0.6, an anomaly event is determined and an alarm is triggered. Meanwhile, the LSTM and other deep learning algorithms are utilized to model time sequence data of system indexes such as CPU utilization rate, memory occupancy rate and the like, and the accuracy and the instantaneity of anomaly detection are continuously improved through model training and parameter tuning. If the Split-BrainAutoencoder model is used, an LSTM layer and a plurality of full connection layers are used in the encoder, the symmetrical structure is used for reversely restoring input data in the decoder, the degree of abnormality of error measurement data is reconstructed, and when the error exceeds a normal value by 3 times of standard deviation, an abnormal time point is determined.
In the preferred embodiment, the application dependency analysis tools ApplicationInsight and DYNATRACE are used to deeply analyze API calls, configuration files and database connection information in application program codes, so that the dependency relationship among services is extracted, and a solid foundation is provided for understanding how system components interact. And then, a system asset topological graph showing the data flow direction is drawn by combining ENTERPRISEARCHITECT architecture design tools, network asset basic data and the dependency relationships, so that the deployment relationship, interface relationship and data flow direction of each component in the system are clear at a glance. On this basis, assets playing a key role in the operation of the system are identified, and security threat analysis is performed by using a STRIE threat modeling method, so that potential risk points of the assets are systematically identified. Finally, by summarizing and evaluating the potential risk points, a system asset risk list is formed, so that not only is the risk condition of the asset clear, but also a basis is provided for subsequent risk management and relief measures, and the control capability of enterprises on the safety of the key system asset is remarkably improved.
As a preferred embodiment of the second embodiment, the drawing a data asset flow chart according to the system asset topology chart, and identifying security risks existing in the sensitive data asset according to the data asset flow chart, to obtain a data asset risk list, specifically:
and drawing a data asset flow chart by adopting a data flow chart DFD method aiming at the data flow, using a Visio flow chart tool, determining the circulation path and access condition of the data in the enterprise, and identifying the security risk existing in the sensitive data asset according to the data classification and risk assessment method to form a data asset risk list.
And drawing a data asset flow chart in the enterprise by adopting a data flow chart DFD method aiming at the data flow and through a Visio flow chart tool to obtain the circulation paths of the data among different business systems, departments and staff. According to the data circulation path, referring to a common data classification method, classifying the data assets according to data sources, confidentiality requirements and importance degree dimensions. Data assets are classified into different security protection levels, public, internal, sensitive and confidential levels by comparison to legal regulations, industry standards and enterprise policy requirements. And analyzing the access condition of the data asset in the circulation process according to the data flow graph and the data classification and grading result. And (3) evaluating the occurrence and influence degree of each risk point by identifying the security risk points of data leakage, unauthorized access and data tampering existing in each link, and calculating to obtain the risk value of the data asset. For the identified data security risk points, a common qualitative and quantitative assessment method in the field of OCTAVE and FRAP information security is used as a reference. By evaluating from both the risk occurrence and the degree of influence, it is determined whether the risk level exceeds an acceptable threshold. If a high risk link of a sensitive data asset is involved, a specific security protection scheme is formulated from a data full lifecycle perspective. And according to the characteristics of each stage of data acquisition, transmission, storage, access, processing and destruction, adopting corresponding safety protection measures.
Specifically, for the direct data flow relation of different systems in the system asset topological graph, a data flow graph DFD method is adopted, flow chart tools such as Visio and the like are used for drawing the data asset flow graph in an enterprise, the circulation paths of data among different business systems, departments and staff are clear, and key nodes such as the source, the destination and the processing process of the data asset are identified through the analysis of the data flow, so that a foundation is provided for the subsequent data security risk analysis. In the process of drawing a data flow graph, a structured mode is adopted to comb data assets, common data classification methods are referred, such as classification is carried out based on dimensions such as sources, confidentiality requirements and importance degrees of data, and the data are classified into different security protection levels such as public levels, internal levels, sensitive levels and confidential levels according to requirements of laws and regulations, industry standards and enterprise policies, management and control requirements of various data assets are defined, a data classification framework of an enterprise is formed, and basis is provided for making a data security policy. According to the data flow graph and the data classification and grading result, analyzing the access condition of the data asset in the circulation process, identifying the possible safety risk points of data leakage, unauthorized access, data tampering and the like in each link, evaluating the possibility and influence degree of each risk point, and calculating the risk value of the data asset. And representing the high risk by adopting a color grade, representing high risk by red, medium risk by yellow and low risk by green, representing the possibility of risk occurrence by the size of the risk points, displaying detailed information of each risk point by hovering a mouse, finally forming a data asset risk thermodynamic diagram, and visually presenting a high risk area. For the identified data security risk points, qualitative and quantitative evaluation methods commonly used in the field of information security, such as OCTAVE, FRAP and the like, are used for evaluating the risk from two dimensions of the possibility and the influence degree of risk. And qualitatively evaluating and scoring according to 1-5 grades by means of risk factor questionnaires, brain storms and the like to form a risk matrix diagram. The quantitative evaluation uses an annual loss expected value (ALE) calculation formula ale=Σ (asset value x risk occurrence probability x vulnerability exposure coefficient), and the risk priority is ordered according to the height of the ALE value. If the risk level exceeds the acceptable threshold, corresponding safety protection measures need to be formulated, and if the risk level is lower, general management and technical control measures can be adopted according to the cost-effectiveness principle. For high risk links involving sensitive data assets, specific security protection schemes are formulated from a data full lifecycle perspective. In the data acquisition stage, the sensitive data is desensitized by using the technologies of data masking, data pseudonymization and the like, and the original information is hidden. And in the data transmission stage, encryption protocols such as SSL/TLS and the like are adopted to prevent data from being stolen in the network transmission process. And in the data storage stage, the static data security is protected by using transparent data encryption, column-level encryption and other technologies. In the data access stage, methods such as role-based access control, minimum authority principle and the like are adopted to strictly limit the access authority of the data. In the data processing stage, privacy protection technologies such as multiparty security calculation, homomorphic encryption and the like are adopted, so that data analysis and mining are realized while the confidentiality of data is protected. In the data destruction stage, the modes of repeated overwriting, physical crushing and the like are adopted to ensure that the waste data cannot be recovered. Meanwhile, through technical means such as a data leakage prevention DLP system and database audit, real-time monitoring and audit are carried out on the access behavior of sensitive data, and abnormal operation is found and blocked in time. Based on the data asset risk assessment, summarizing the identified various data security risks, and generating a data asset risk list of the enterprise by combining factors such as risk level, influence range, correction difficulty and the like. When the risk treatment plan is formulated, different risk treatment strategies such as risk avoidance, risk alleviation, risk transfer, risk acceptance and the like are formulated by combining factors such as cost benefit, technical feasibility, business influence and the like of risk response. Meanwhile, data security risk assessment and audit are carried out regularly, new data security risks are continuously identified and assessed, a data security protection strategy is optimized, and security and controllability of data assets are ensured. In drawing a dataflow graph, microsoftVisio tools may be used to trace the path of data through the enterprise by dragging dataflow graph elements, such as external entities, data flows, processes, and data stores. The data transmission process from the service system a to the database B and then to the application C uses the arrowed connection line to represent the data flow direction, and marks the data content, the flow size and other attributes. For complex data flow diagrams, a hierarchical drawing mode can be adopted, firstly, a top-Level overview chart (Level 0) is drawn, and then, the top-Level overview chart is refined to a sub-process chart (Level 1/2/3) layer by layer. In the classification and grading of data, the data can be classified into three protection levels of high, medium and low according to three dimensions of confidentiality, integrity and availability of the data by referring to NISTSP, 800-53 and other standards. The private data such as the identification card number, the bank card number and the like of the customer belong to high confidentiality, the business data such as order amount, stock quantity and the like belong to medium integrity, and the public data such as company news, product manuals and the like belong to low availability. And adopting a decision tree algorithm, and classifying each item of data step by setting a series of judging conditions to finally form a data classification matrix. In the risk assessment process, a STRIE threat modeling method can be used for identifying security threats faced by data from six dimensions of deception, tampering, denial of service, information leakage, denial of service and privilege elevation. If the account number of the database manager is stolen, the method belongs to privilege elevation threat, and if the mail containing sensitive information is missent by staff, the method belongs to information leakage threat. And (3) scoring by security specialists according to the occurrence Probability (Probability) and the influence degree (Impact) of each threat by adopting a qualitative assessment method to form a 5x5 risk matrix, wherein score 1 represents VeryLow, score 2 represents Low, score 3 represents Medium, score 4 represents High and score 5 represents VeryHigh. In combination with the importance Weight (Weight) of the data asset, the risk value (RiskScore) faced by each item of data is calculated, wherein riskscore=probability×image×weight. For sensitive data with high risk, a format retention encryption (FormatPreservingEncryption, FPE) algorithm can be adopted to perform equal-length substitution on the structured data and maintain the original data format when the data is desensitized. In the aspect of data encryption, a national cipher SM4 block cipher algorithm can be adopted, data are encrypted in a block mode by using a 128-bit key, each block is 128 bits in length, the encryption process comprises 32 rounds of iteration, nonlinear transformation, linear transformation, round key addition and other operations are respectively carried out in each round, and differential attack and linear attack can be effectively resisted. During data security audit, a rule-based detection engine can be used for periodically scanning database operation logs, and suspicious behaviors can be found through preset audit rules (such as sensitive table access, large-volume data derivation and the like). Meanwhile, a machine learning algorithm is used for establishing a user behavior baseline, and abnormal behavior patterns are identified through models such as clustering and classification. If a K-Means clustering algorithm is used, user behaviors are classified into normal and abnormal according to the duration, access data amount, operation type and other characteristics of a database session, and when the behaviors of a certain session deviate from the center point of the normal class by more than a set threshold (such as 2 times of standard deviation), the abnormal is judged, and a safety alarm is triggered. When the data security policy is formulated, the risk minimization principle is followed, strict management and control measures such as forbidding outsourcing, forced encryption, frequent audit and the like are adopted for high-risk data, and relatively loose policies can be adopted for low-risk data, so that the service flexibility and the security compliance are considered.
In the preferred embodiment, the present application outlines the path of data flow in the system by tracking the start and end points of the data flow. With this flow path information, a dataflow graph (DFD) approach is used to detail the dataflow graph, which visually illustrates the general view of the data flow. The data flows in the flow graph are then classified and ranked, identifying abnormal data flows that may be predictive of potential security risks. These abnormal data flows are subjected to in-depth risk assessment to identify possible security risk points during data streaming. And finally, summarizing all the identified security risk points to form an exhaustive data asset risk list. The risk list records specific information of each security risk point in detail, including the positions of the security risk points in the data asset flow chart and potential security threats, provides a clear data security risk management view for enterprises, and is beneficial to the enterprises to take targeted data protection measures and enhance data security.
As a preferred embodiment of the second embodiment, according to the preset big data analysis method, the occurrence process and the influence range of each risk event are identified on the system asset risk list and the data asset risk list, so as to obtain a risk event analysis report, which specifically includes:
acquiring risk event information including intrusion detection, virus infection and data leakage existing in enterprise assets according to mapping results of a system asset risk list and a data asset risk list, acquiring and classifying the risk events, analyzing by utilizing big data, mining attack means and attack paths behind the events, tracking occurrence processes and influence ranges of the events, and forming a risk event analysis report.
As shown in fig. 3, the data packet, the traffic and the session information of the network layer are collected in real time by deploying an intrusion detection system, an antivirus software and a data leakage prevention system security monitoring device. And identifying suspicious security events by using a regular expression matching and feature code detection method according to the session information. And preprocessing and extracting features of the security event data by adopting a big data processing platform comprising Hadoop and Spark and a MapReduce and RDD parallel computing model. Noise data is filtered through data cleaning, data conversion and data reduction ETL operation, and unstructured data is converted into a structured form. The structured security event data is intelligently analyzed from multiple dimensions using machine learning algorithms. Through cluster analysis, event sets of similar attack activities are identified. The association rule mining algorithm comprises Apriori and FP-Growth, and association rules among event attributes are found. And (3) carrying out correlation analysis on security events detected in the enterprise and external malicious IP, domain name, sample and attack manipulation threat information data through docking with a threat information platform, tracking event tracing, and judging the influence degree and hazard range of the event.
Specifically, according to mapping results of a system asset risk list and a data asset risk list, risk event information existing in enterprise assets is obtained, information such as data packets, traffic, sessions and the like of a network layer is collected in real time by deploying security monitoring equipment such as an intrusion detection system, anti-virus software and a data anti-leakage system, information such as processes, files, registries and account numbers of a host layer, information such as user behaviors, business operations and abnormal errors of an application layer, and threat information data such as external malicious IP (Internet protocol), domain names, samples and attack methods are collected. Suspicious security events are identified by using methods such as regular expression matching, feature code detection and the like, and the events are automatically classified according to threat level, attack stage, target asset and other attributes of the events to form a structured event data set. And preprocessing and feature extraction are carried out on the collected massive safety event data by adopting a big data processing platform such as Hadoop, spark and the like and utilizing parallel computing models such as MapReduce, RDD and the like, noise data is filtered through ETL operations such as data cleaning, data conversion, data protocol and the like, unstructured data is converted into a structured form, key attributes which can most represent event features are selected through a feature selection algorithm, and a data set suitable for mining analysis is constructed. On a big data analysis platform, intelligent analysis is performed on the security events from multiple dimensions using machine learning algorithms. The method comprises the steps of adopting common event clustering dimension and distance measurement methods, such as clustering according to attack types, converting types into 0-1 vectors by adopting One-Hot coding according to the attack types of events such as scanning, injection, vulnerability utilization and the like, measuring similarity among the events by using Euclidean distance, clustering according to attack sources, expressing the similarity of the IP by using the numerical value of the IP address according to the IP address of the attack sources of the events, measuring the similarity of the IP by using cosine distance, clustering according to target assets, carrying out hierarchical vector expression on the asset types according to the target asset types of the events such as a server, a database, an application system and the like, and measuring the similarity of the assets by using Jaccard distance. Through cluster analysis, event sets of similar attack activities are identified. For each type of security event, an association rule mining algorithm such as Apriori, FP-Growth and the like is adopted, association rules among event attributes are found, for example, certain types of attacks usually cause specific system anomalies, then causal relations among events are judged, and an occurrence path of an attack chain is deduced. The method comprises the steps of adopting a heterogeneous network analysis method based on a graph to extract an attacker IP, an attack event and a victim asset in an event record as nodes of the graph respectively, setting different node type attributes, establishing directed edges among related nodes according to information such as time stamps, event types, attack methods and the like in the event record, describing association relations and sequences among the nodes, setting weight attributes on the edges, and giving the edges weight by the danger degree, the occurrence frequency and the like of the event to represent association strength. The complex association between event nodes is stored through graph databases such as Neo4j and JanusGraph, and the importance and the correlation of the attack events in the network are calculated by using graph algorithms such as PageRank and shortest paths, and the images, attack capability, attack preference and the like of the attackers behind the events are revealed. In the event analysis process, the security event detected in the enterprise is subjected to association analysis with threat information data such as an external malicious IP (Internet protocol), a domain name, a sample, an attack manipulation and the like by docking with a threat information platform, the source tracing of the event is tracked, and the influence degree and the hazard range of the event are judged. And (3) integrating internal and external security event data by adopting a situation awareness technology, and describing the overall network security threat situation faced by the enterprise. According to the event analysis result, a multi-dimensional security event analysis report is output, the attack type distribution, attack source distribution, attack means statistics and the like suffered by an enterprise within a period of time are presented from a macroscopic view, the overall trend of security threat is revealed, the detailed process of a major security event including the attack chain, the influence range, the loss evaluation, the disposal measures and the like of the event is presented from a microscopic view, so that security management personnel can conveniently and comprehensively grasp the security event condition, and guide the optimization of a security protection strategy. And establishing a real-time safety event monitoring and early warning mechanism, deploying an event analysis model to a stream processing engine such as a Flink, a Storm and the like, and carrying out real-time calculation on newly acquired safety event data. The method comprises the steps of calculating the deviation degree of samples from a mean value by adopting a statistical anomaly detection algorithm such as Z-Score and MAD, judging whether the samples are abnormal or not, finding abnormal points in a high-dimensional event space by adopting a high-dimensional anomaly detection algorithm such as PCA and KNN through dimension reduction or neighborhood analysis, and finding abnormal time points of time sequence data by adopting an anomaly detection algorithm such as S-H-ESD and ARIMA through analysis of the period, trend, residual error and the like of the time sequence data. The method comprises the steps of analyzing user behavior modeling, counting behavior modes of users such as login time, operation frequency, resource use and the like, constructing user portraits, dividing different user groups by adopting a clustering method and the like, taking the overall characteristics of the groups as the standard of anomaly judgment, discovering association rules and sequence modes among the operations of the users at different time points by mining the time sequence of the user behaviors, and judging whether the behavior track of the users is abnormal or not based on the frequent modes and the association rules. According to a preset threshold rule, safety alarms of different grades are automatically triggered, and safety operation and maintenance personnel are informed to carry out emergency treatment in a mode of mail, short messages, worksheets and the like, so that loss caused by a safety event is reduced to the greatest extent. Meanwhile, event feature engineering and machine learning algorithms are continuously optimized, and timeliness and accuracy of event detection and early warning are improved. When collecting security event data, a Snort intrusion detection system can be used, malicious traffic in a network is detected and alarmed in real time by defining a rule base such as 'ALERTTCPANYANY- > 192.168.1.0/2480', a Splunk log management system is adopted, various log data are collected from network equipment, an operating system, an application system and the like through a Forwarder component, and the daily collection log quantity can reach 50GB. And extracting user names and the like by adopting regular expressions, and identifying frequent abnormal access and sensitive user operation by counting the frequencies of the IP addresses and the user names. And storing and processing massive logs by using a Hadoop distributed platform, carrying out ETL cleaning on the structured logs through HiveSQL, carrying out word segmentation, stop word removal and other processing on unstructured texts through a MapReduce program, and extracting more than 20 characteristic fields such as attack time, attack type, attack source, target asset and the like. And adopting a PCA principal component analysis algorithm, and selecting the first k features with the feature vector accumulated variance contribution rate greater than 95% as an event feature set. During event cluster analysis, performing unsupervised clustering on events by using a K-Means algorithm, dividing the events closest to the event feature vectors into the same cluster by calculating Euclidean distance between the event feature vectors, setting the cluster number of k=5, and running for 10 times to obtain an optimal clustering result, wherein the maximum iteration number of max_iter=100. For attack source IP clustering, converting the IP into a 32-bit integer to represent, reducing the influence of excessive numerical difference on the integer logarithm, then calculating cosine similarity between IP vectors, and dividing the IP with the cosine value larger than 0.8 into the same cluster. Using Apriori association rule mining algorithm, discovering association rules between attack events, such as ' vulnerability scanning event & ' brute force cracking event- > remote control event (sup=0.08, conf=0.85) ' with minimum support degree min_sup=0.05, minimum confidence degree min_conf=0.8, maximum frequent item set max_len=5, and revealing typical penetration attack link. When an event map is constructed, the IP, the port, the attack type, the attack time, the attack target and the like of an attack event are extracted as nodes of the map, the nodes connected with the same event form a complete attack path, the attack severity is set as node weight, the high-risk attack weight is reset to 1.0, the medium-risk attack weight is reset to 0.6, the low-risk attack weight is reset to 0.3, and the centrality and the near centrality of the nodes reflect the importance degree of the nodes in the attack. And calculating importance weights of each node according to the link relation among the nodes by using a PageRank iterative propagation algorithm, wherein the damping coefficient d=0.85, the maximum iteration number max_iter=100, and the node of the weight ranking top10 serves as a key attack node. And calculating the shortest attack path among the key nodes through a shortest path algorithm, revealing the shortest attack path from initial penetration to final control target, and taking the path length as a measure of attack complexity. When threat information association analysis is performed, IOC indexes such as domain names, IP and sample Hash related to an event are submitted to an open threat information library such as VirusTotal, alienVault through a threat information inquiry API, malicious scores of the IOCs, threat information labels such as affiliated CCs and attack organizations are obtained, and the IOCs with scores greater than 7 are judged to be high-risk threats. Combining the information label with the clustering analysis result, counting threat level distribution of various events, and dividing attack events according to high, medium and low dangers to form a network threat situation distribution map. when abnormal behavior is detected, an LSTM neural network algorithm is used, a user behavior sequence of the last 30 days is taken as input, 100 LSTM neurons are used in the middle layer, a training model predicts the operation sequence of a user, and the sequence prediction error exceeds 2 times of standard deviation to be regarded as abnormal. And the SHAP technology is used for explaining the main basis of abnormal behavior judgment, top5 behavior characteristics with the largest contribution degree are extracted, a user abnormal behavior pattern library is generated, and interpretable abnormal detection is realized.
In the preferred embodiment, the present application obtains risk events existing in the enterprise assets according to the system asset risk list and the data asset risk list by using a preset big data analysis technology, and these events cover key security problems such as intrusion detection, virus infection, data leakage, etc. The risk events are then systematically collected and classified, ensuring that each event is accurately identified and archived. Then, the big data analysis technology is used for deep mining, and attack means and attack paths behind each risk event are identified, wherein the step is realized by analyzing patterns and abnormal behaviors in event data. Finally, based on these means of attack and paths, the occurrence and scope of influence of each risk event is identified in detail, including how the event began, how it propagated, and its specific impact on the enterprise asset. Through the coherent analysis process, a comprehensive risk event analysis report is finally formed, the report records the detailed information of the event and the influence on the enterprise safety in detail, and decision support of risk management and response is provided for the enterprise.
The construction module 30 is configured to construct a causal chain of each risk event according to the risk event analysis report, so as to obtain a risk root of each risk event;
and constructing a risk propagation map according to the risk source, and carrying out visual display and dynamic tracking on each risk event according to the risk propagation map.
As a preferred embodiment of the second embodiment, according to the preset big data analysis method, the occurrence process and the influence range of each risk event are identified on the system asset risk list and the data asset risk list, so as to obtain a risk event analysis report, which specifically includes:
And (3) adopting causal reasoning, and reasoning fundamental factors of event occurrence according to a risk event analysis report, constructing a causal chain of event occurrence by combining asset vulnerabilities, system vulnerabilities and data flow direction factors, and determining the position and importance degree of risk points in the causal chain to obtain the root of the risk.
By adopting a root cause analysis method, a structured causal reasoning framework is formed through the processes of identifying problems, collecting data, making assumptions, verifying assumptions and identifying factors to corrective measures. And comprehensively combing various vulnerability factors of technical loopholes, management defects and process omission existing in the enterprise information system according to the causal reasoning framework, and identifying key risk points. By carrying out full life cycle flow tracking on the data assets of the enterprise, the circulation path and the service condition of each link of data acquisition, transmission, storage, processing and exchange are defined. And adopting an attack graph modeling method to correlate the data flow path with the risk factor nodes identified in the use condition in the form of an attack path. All attack paths are enumerated by ModelChecking and logical reasoning to form a complete causal chain for risk event occurrence. And quantifying the severity of the vulnerability from the angles of attack vectors, attack complexity and authority requirements by adopting a CVSS (compound visual system), and evaluating the hazard degree of each risk point. And calculating the importance of each risk point in the attack path according to the severity of the vulnerability, and determining the key risk points. According to the causal chain analysis and the risk point assessment result, a risk traceability report is automatically generated, the root factors, key risk points and vulnerability factors of the occurrence of the risk event are defined, and the risk level, the hazard score and the treatment priority quantification index of the risk points are given.
Specifically, through a causal reasoning technology, the critical risk events identified in the risk event analysis report are subjected to deep cause tracing and analysis. The root cause analysis method is adopted, the flow of 'recognition problem-data collection-making assumption-assumption verification-cause recognition-corrective measures' is followed, deep causes of occurrence of events are explored layer by layer until the cause of the most primitive is found out, a structured causal tree is formed, and the logical relationship of the causes and the consequences of occurrence of risk events is displayed. In the causal reasoning process, various vulnerability factors such as technical loopholes, management defects, flow omission and the like existing in an enterprise information system are comprehensively combed, wherein the vulnerability factors comprise software and hardware loopholes of the system, weak points of network architecture, dead zones of safety protection, insufficient management systems, lack of personnel safety consciousness and the like, the association degree and influence paths between the factors and risk events are evaluated, and the action of the factors in the event occurrence is judged. The flow direction tracking of the whole life cycle is carried out on the data assets of the enterprise, the circulation paths and the service conditions of sensitive data in the links of acquisition, transmission, storage, processing, exchange and the like are clarified, leakage risk points in the data circulation process, such as unauthorized data access, data channels lacking safety protection, too wide data sharing range and the like, are identified, and the causal relationship between weak links and risk events is analyzed. On the basis of risk factors identified by multiple dimensions such as asset vulnerabilities, system vulnerabilities, data flow directions and the like, an attack graph modeling method is adopted to associate factor nodes in the form of attack paths. The basic components of the attack graph include nodes representing system states or vulnerabilities, edges representing attack behavior. By utilizing ModelChecking, logical reasoning and other technologies, all possible attack paths are enumerated to form a complete causal chain of occurrence of the risk event, a complete utilization path from an initial vulnerability point to a final hazard result is reflected, and the propulsion effect of the risk factors in the event occurrence is intuitively displayed. And quantitatively evaluating the event cause and effect chain, and calculating the importance degree of each risk point in the attack path. And the severity of the vulnerability is quantified from the attack vector, the attack complexity, the authority requirement and the like by adopting a CVSS (universal vulnerability scoring system). The time, money, manpower and other attack costs required by the attacker to finish a certain attack step are estimated, and the possibility that the attacker successfully implements the certain attack step is estimated. And then calculating the risk value of each node in the attack graph according to the formula of node risk value = node attack cost x node attack probability x node vulnerability CVSS score, and accumulating the node risk values on each complete attack path to obtain the path risk value, so as to judge the key nodes on the causal chain, namely the risk points which have the greatest influence on the event and are most required to be processed preferentially. According to the causal chain analysis and the risk point assessment result, a risk tracing report is automatically generated, the root cause, the key risk point, the vulnerability factor and the like of the occurrence of the risk event are defined, and the risk level, the hazard score and the disposal priority equalization index of the risk point are given. aiming at different types of risk sources, targeted safety reinforcement suggestions such as vulnerability restoration, authority convergence, boundary protection, process optimization, consciousness culture and the like are provided, so that a scientific and efficient risk closed-loop treatment process is formed. And feeding back the result of the risk tracing analysis to a safe operation flow, verifying and evaluating the integrity of a causal chain and the judgment accuracy of key risk points in the modes of attack and defense countermeasure exercise and the like, continuously optimizing a causal reasoning model, improving the reliability and practicality of the risk tracing analysis, and reducing the possibility of risk occurrence from the source according to the result of the risk root analysis. When the root cause analysis of the risk event is performed, various possible factors of the event occurrence can be classified into different factor branches such as people, machines, materials, methods, rings and the like by adopting a fish bone map (FishboneDiagram), and then deep exploration is performed along each branch. For example, for a network intrusion event, on the branch of "people" there may be factors related to insufficient security consciousness of management personnel, improper operation of staff, proficiency of attacker, etc., on the branch of "machines" there may be factors related to unrepaired system loopholes, missing protective equipment, insufficient monitoring capability, etc., and on the branch of "law" there may be factors related to lack of effective security policy, imperfect emergency plan, insufficient flow control, etc. By checking and evaluating each branch factor one by one, the key cause node is identified, and then each node is recursively subjected to deeper causal reasoning to form a complete event cause chain. As for the finally located critical bug, the formation cause of the bug needs to be further mined, which is caused by inherent defects of software codes or negligence of system configuration, until the root cause of the problem is traced. In the vulnerability analysis process, the security vulnerability of the asset can be systematically evaluated from six dimensions of disguise (Spoofing), tamper (TAMPERING), denial (Repudiation), information disclosure (InformationDisclosure), denial of service (DenialofService), privilege elevation (ElevationofPrivilege) using a STRIE threat modeling method. If an asset-threat-vulnerability corresponding matrix is adopted, threat scenes possibly faced by each type of asset are analyzed one by one, and weak links of the asset in the scenes are judged. taking a database system as an example, in the dimension of S, the authority setting of a database account can be improper, so that a user with low authority can be disguised as an administrator identity, in the dimension of T, the database can lack encryption measures, so that sensitive data is illegally tampered, and in the dimension of I, the audit function of a database log can be lost, so that a data leakage event can not be traced. And quantifying the risk level of each vulnerability by 1-5 scores to obtain the vulnerability risk score of each type of asset, and judging the weakness degree of the asset according to the vulnerability risk score. When the attack graph is generated, a MulVAL (Multihost, multistageVulnerabilityAnalysis) logical reasoning system can be used, and a network topology graph, host configuration information, known vulnerabilities and the like are taken as input to automatically analyze all possible attack paths. MulVAL adopts a Datalog clause form to represent the causal relationship between the attack preconditions and the attack consequence states, and then utilizes a logical reasoning engine to recursively calculate all the consequence states which can be achieved by the initial conditions. If the condition clause vulExists (webServer, VE) indicates that the Web server has a vulnerability VE, the condition clause hasAccount (attacker, webServer, root) indicates that an attacker obtains root rights on the Web server, and causal links exist between the two. Through iterative reasoning, a complete attack path which is shaped as 'VE vulnerability → root authority → intranet penetration → data theft' can be identified, and the progressive relationship of each weak point in the attack process is revealed. In quantitative risk assessment, factors such as asset importance, attack frequency, influence range and the like can be introduced besides common CVSS indexes, and a multi-dimensional security risk measurement model is constructed. Taking the importance of the asset as an example, the hierarchical analysis AHP can be adopted according to factors such as the value of the asset, the sensitivity of the data and the like, and the importance degree of different assets is given with a weight value of 1-10 points. And combining the asset importance weight, the attack cost, the attack probability, the vulnerability CVSS score and the like of each attack node to establish a function relation of the attack_risk (node) =asset_weight (node) × occur _prob (node) × CVSS _base (node)/the attack_cost (node), and calculating the risk value of each node, wherein the higher the node risk value is, the higher the importance degree of the node in an attack chain is. On the basis, the risk value of each complete attack path can be further analyzed, a factor graph (FactorGraph) probabilistic reasoning model is used for solving the joint probability distribution of path_risk (path) = pi node e pathattack _risk (node), so that topN paths most likely to be utilized by an attacker are judged, and defensive resources are deployed in a targeted manner.
In a preferred embodiment of the second embodiment, the application further acquires internal and external data related to the risk event through multi-channel data acquisition, wherein the internal and external data comprise threat information, a vulnerability library and a security community, and performs association analysis on the acquired data and enterprise internal data by utilizing data fusion, so as to expand the background information and the influence range of the risk event.
Data related to security threats faced by enterprises, including malicious IP addresses, domain names, URLs, and file HashIOC indexes, are continuously acquired by subscribing to threat intelligence sources. For unstructured threat information, natural language processing is utilized to extract and arrange, and named entity recognition, keyword extraction, emotion analysis and topic modeling methods are adopted to obtain structured threat information data. And periodically acquiring the latest vulnerability information from the authority vulnerability database through the web crawler. And constructing a mapping relation between the enterprise asset and the known vulnerability by utilizing a knowledge graph according to the vulnerability information, and judging the high-risk vulnerability of the software and hardware system used by the enterprise. Underground threat information is acquired by participating in a hacker forum and a secure community channel. Aiming at the underground threat information, the social network analysis is utilized to mine the association relation among different community users, and active threat participants and behind-the-scenes operators are identified. And uniformly storing threat information, vulnerability data and security events on a big data platform by adopting a data warehouse. Aiming at multi-source heterogeneous data, data fusion is utilized, and data with different sources, formats and granularity are converted into a uniform structural form through data cleaning, normalization and associated ETL flow. And carrying out association analysis on threat information and enterprise internal data on the fused multi-source data set by utilizing data mining. And adopting a frequent item set and an association rule mining algorithm to discover the internal relation among threat events, fragile assets and attack skills from the massive data. And dividing scattered data points into different clusters according to the similarity of the data attributes by adopting a clustering algorithm, and mining event clusters with similar attack modes and risk characteristics. And mapping the data obtained by the association analysis into nodes and edges by using a graph database, and constructing a multi-element complex association network map. On the basis of the map, a map embedding algorithm is adopted to automatically learn out low-dimensional vector representation of the event nodes, and high-order correlation among the event nodes is mined out. And finally, integrating the results of data fusion, association analysis, clustering and mapping together to form a more comprehensive, three-dimensional and accurate depiction of the risk event, and providing data support for subsequent risk assessment, early warning and decision making.
Specifically, through subscribing threat information sources, including open source information, business information, industry information and the like, data related to security threats faced by enterprises, such as IOC indexes of malicious IP addresses, domain names, URLs, file Hash and the like, and background information of APT organization, attack techniques and the like, are continuously acquired. Unstructured threat information is extracted and collated using natural language processing techniques. The method comprises the steps of adopting a named entity recognition method, extracting entity objects such as IP addresses, domain names and hash values from an information text by utilizing technologies such as regular expressions, dictionary matching and the like, automatically recognizing keywords threatening information by utilizing keyword extraction algorithms such as TF-IDF, textRank and the like, classifying topic attributes of the information according to the keywords, judging emotion tendency in information description by utilizing emotion analysis technology, knowing severity of attack events, risk of vulnerabilities and the like, and discovering potential topic distribution from a large number of information corpuses by utilizing topic modeling methods such as LDA, LSI and the like to realize clustering of threat information. The method comprises the steps of periodically obtaining latest vulnerability information from authoritative vulnerability databases such as NVD (network video device), CVE (computer program environment) and the like through a web crawler technology, including vulnerability description, influence version, utilization codes and the like, constructing a mapping relation between enterprise assets and known vulnerabilities by utilizing a knowledge graph technology, timely finding out possible high-risk vulnerabilities of software and hardware systems used by enterprises, and evaluating potential influences of the vulnerabilities on a service system. Through participating in channels such as a hack forum and a secure community, underground threat information such as sales information of malicious software, sample information of data leakage and latest attack methods of attack partners is obtained, association relations among users in different communities are mined by utilizing a social network analysis technology, active threat participants and behind-the-scenes operators are identified, and the coming and going of targeted attack activities are insight. Based on multi-source heterogeneous data acquisition, threat information, vulnerability data, security events and the like are uniformly stored on a big data platform such as Hive by adopting a data warehouse technology, so that integrated management and association analysis of data are facilitated. Meanwhile, data with different sources, formats and granularity are converted into a uniform structural form through ETL processes such as data cleaning, normalization and association by utilizing a data fusion technology. In the data cleaning link, preprocessing such as de-duplication, de-noising and format conversion is carried out on original data, the data quality is improved, in the data normalization link, unified field naming, data types and measurement units are adopted for similar data of different sources, semantic ambiguity is eliminated, in the data association link, scattered data are combined and connected according to key attributes such as time stamps and object IDs, and a relational network between the data is constructed. And finally, establishing a global associated view of the multi-source security data, and laying a foundation for deep mining analysis. And carrying out association analysis on threat information and enterprise internal data on the fused multi-source data set by utilizing a data mining technology. By adopting frequent item sets and association rule mining algorithms, such as Apriori, FP-Growth and the like, internal links among threat events, fragile assets and attack methods are discovered from massive data, for example, certain types of malicious software are usually discovered to be associated with specific CC servers, and then the curtain back-partner of targeted attack activities is deduced. And adopting a clustering algorithm, such as K-Means, DBSCAN and the like, dividing scattered data points into different clusters according to the similarity of data attributes, mining event clusters with similar attack modes and risk characteristics, and identifying security events with wider influence range. And analyzing statistical characteristics such as time distribution, spatial distribution, target type and the like of each event cluster, and judging the severity and potential risk of the event. And mapping the data obtained by association analysis into nodes and edges by using a graph database technology such as Neo4j, constructing a multi-complex association network map, and intuitively presenting the evolution track and propagation path of the risk event in different dimensions such as time, space, logic and the like. Based on the map, a map embedding algorithm, such as DeepWalk, graphSAGE, is adopted to automatically learn out the low-dimensional vector representation of the event node. The core idea of graph embedding is to map nodes in the graph into a low-dimensional vector space so that there are linked nodes in the graph whose vector representations are also similar. The expression learning method can automatically extract semantic features contained in the network structure and mine high-order correlation among event nodes. Based on the vectorization representation of the event nodes, algorithms such as spectral clustering, K nearest neighbor and the like can be further adopted to mine more hidden event association from the perspective of semantic similarity. And finally, integrating the results of data fusion, association analysis, clustering and mapping together to form a more comprehensive, three-dimensional and accurate depiction of the risk event, and providing data support for subsequent risk assessment, early warning and decision making. In the process of acquiring and extracting threat information, a regular expression can be utilized to rapidly extract IOC indexes from unstructured information texts. For the extracted IOC index, the emotion dictionary and dependency syntax analysis can be utilized to judge whether the context where the IOC index is located represents maliciousness. If description of 'domain.com implanted malicious code' appears in the information, the domain name can be judged to be a malicious domain name by obtaining the relation between 'implanted' and 'malicious' through dependency analysis. When classifying information, word embedding models such as Word2Vec can be adopted, word vectors are trained in a Skip-gram mode, and the similarity degree of information keywords and different categories is calculated through cosine similarity, wherein the similarity degree is greater than 0.6 and is classified into corresponding categories. in crawling of structural vulnerability libraries such as NVD, an XPath analyzer can be used for positioning to "/NVD:feed/NVD:entry" nodes, extracting label information such as description and harm below the nodes, analyzing key fields such as CVE number, CVSS score and influence version of the vulnerability by using BeautifulSoup library, and finally calculating text distance between enterprise asset version information and vulnerability influence version by adopting WordMover' SDISTANCE algorithm, wherein the distance is smaller than 1.5 and is considered to be matched, so that an asset vulnerability mapping map is generated. In the social network analysis process, a PageRank iterative algorithm can be adopted, a relation directed graph is constructed according to user interaction behaviors, a damping coefficient is set to be 0.85, initial node weights are all 1, user importance scores after iterative calculation and convergence are screened out, and users with scores greater than 10 are selected as key opinion leadership. Performing topic clustering on post contents issued by key users in different communities, adopting a nonnegative matrix factorization NMF algorithm, setting topic number k=5, extracting TFIDF characteristics of the posts, obtaining a topic-word matrix and a user-topic matrix by minimizing reconstruction errors, taking topic words with weights larger than 0.3 as main topics discussed by the users, and accordingly revealing interaction characteristics and topic trends among different communities and different users. In the data fusion and association analysis, the multi-source heterogeneous data is stored by adopting a graph database, and the data cleaning, normalization and graph construction are realized by using a Cypher query language. The entity object and the relation edge are inserted into the gallery through the Create and Merge sentences, and the multi-hop association among the entities is conveniently mined by adopting the natural representation form of the node-edge-attribute. In association rule mining, an FP-Growth algorithm can be adopted, the minimum support degree is set to be 0.05, the minimum confidence degree is set to be 0.8, and a frequent item set and association rules are generated. The method comprises the steps of converting entities such as attack clusters, attack events, target assets and the like into elements in a collection, finding out frequent co-occurrence modes among the entities through iterative tree building and recursion mining, and revealing the intrinsic mechanism of malicious activities. In cluster analysis of multi-source data, a DBSCAN density-based clustering algorithm can be adopted, and the clusters of points connected in density can be identified by calculating Euclidean distances among data points. The radius parameter eps=0.5 and the density threshold min_samples=5 are set, i.e. when there are at least 5 points in a neighborhood with radius of 0.5 around each data point, they are connected into a cluster in density. For noise points, local outlier factors of the noise points can be calculated through an outlier detection algorithm LOF, and the local outlier factors are regarded as outliers and are treated independently, wherein the factors are larger than 1.5. In the clustering process, attack time, source IP, destination port and the like are selected as feature dimensions, a plurality of attributes of attack events are comprehensively clustered, the similarity of the events can be comprehensively described, and event clusters with the same attack mode can be mined. For each event cluster, time sequence analysis and statistical hypothesis testing can be adopted to judge whether the cluster shows sudden explosive growth in time and the difference significance of event distribution and random distribution, so as to infer the severity and influence range of the event. In the correlation analysis based on graph embedding, a DeepWalk algorithm can be used for generating a node sequence through random walk, training is carried out by adopting a Skip-gram model, and the log likelihood of node co-occurrence is optimized to obtain a low-dimensional vector representation of the node. Setting the number of steps of random walk as 10, the window size as 5, the negative sampling number as 5, and iteratively training 200 epochs, and finally mapping the nodes to a 128-dimensional vector space. In the space, euclidean distance between event node vectors can be calculated, the distance is smaller than 0.5 and can be regarded as the nodes with similar semantics, a spectral clustering algorithm is further adopted, a similarity matrix between the nodes is calculated, K-Means clustering is carried out on the similarity matrix, and finally, an event association mode hidden behind a graph structure is found.
As a preferred embodiment of the second embodiment, the constructing a risk propagation map according to the risk source, and performing visual display and dynamic tracking on each risk event according to the risk propagation map specifically includes:
And (3) adopting risk tracing, tracing back a path and an influence range of risk propagation according to a causal chain of a risk event, identifying key nodes and propagation paths of the risk through dependency relationship and data flow direction among assets, and constructing a risk propagation map to realize visual display and dynamic tracking of the risk.
Network traffic data, system operation logs and database access records multi-source heterogeneous data are collected in real time by deploying an IDS intrusion detection system, a log audit system and a database audit system safety monitoring tool. And storing, cleaning, correlating and analyzing the massive heterogeneous data by utilizing a Hadoop and Spark big data processing platform, and identifying suspicious security events and abnormal behavior patterns in the network. And aiming at the identified security risk event, adopting causal reasoning by a root cause analysis method, backtracking from an attack result to an attack origin, analyzing the pre-cause results of the occurrence of the event layer by layer, and judging the time sequence association and causal dependency relationship between the events. And (3) comprehensively combing the information assets and the network architecture of the enterprise by collecting asset configuration information, a network topology structure and protection strategy rules, and constructing an asset library and a knowledge library which cover the whole scene. Each IT, OT and IoT asset in the network is discovered by asset mapping, the physical logic position, the business application, the communication protocol and the vulnerability information of the asset are identified, the asset topological graph and the dependency graph of the enterprise are drawn, and the asset connection conditions of the cross-network segments and the cross-region are visually presented. And (3) focusing on full life cycle management of the data asset, deploying a data leakage protection system, and collecting behavior events of creation, storage, circulation and use links of the sensitive data asset. Metadata information of the data asset is extracted through data blood-edge analysis, and a complete traceability map of the data asset from generation to use is constructed. On the basis of the risk propagation map, integrating real-time service monitoring data, applying stream calculation and complex event processing, associating service abnormality and security risk event in real time, dynamically deducing a potential propagation path of risk in an asset network, and tracking and early warning the evolution process of the risk.
Specifically, through deployment of security monitoring tools such as an IDS intrusion detection system, a log audit system, a database audit system and the like, multi-source heterogeneous data such as network flow data, system operation logs, database access records and the like are collected in real time, massive data are stored, cleaned, correlated and analyzed by utilizing a big data processing platform such as Hadoop, spark and the like, suspicious security events are identified, abnormal behavior patterns in a network are found, and a data basis is provided for risk tracing analysis. And aiming at the identified security risk event, adopting a causal reasoning technology, such as root cause analysis method, backtracking from an attack result to an attack origin, and analyzing the pre-cause results of the event occurrence layer by layer. By using the time sequence relation and correlation between the security events, judging whether one event is the cause of the other event or not through methods such as Granger cause and effect test of time sequence analysis. In addition, the request-response log can be analyzed through a process mining technology, the calling dependency relationship among the system components is revealed, and finally, a complete causal topology and an event chain are formed. And comprehensively combing the information assets and network architecture of enterprises, collecting asset configuration information, network topology structures, protection policy rules and the like, constructing an asset library and a knowledge base which cover the whole scene, and forming an asset whole life cycle management and configuration change management mechanism. By adopting an asset mapping technology, each IT, OT, ioT asset in the network is found, the physical logic position, the affiliated business application, the communication protocol, the vulnerability information and the like of the asset are identified, the asset topological graph and the dependency graph of an enterprise are drawn, the asset connection condition of the cross-network segment and the cross-region is visually presented, and the security boundary and the attack surface are known. Based on asset mapping results, vulnerability information, authority information and the like of all asset nodes are mapped into an attack graph model by utilizing an attack graph construction technology. The network accessibility of each host is abstracted to be a directed edge of an attack graph, vulnerable ports and services on the hosts are abstracted to be nodes, the CVSS scores of the CVE vulnerability library are utilized to weight the dangerous degree of the nodes, and finally a logic attack graph with definite node-edge semantics is formed. On the basis, a causal graph model such as formalized predicate logic or Bayesian network is used for defining reasoning rules between attack preconditions and attack result states, and all potential attack paths from vulnerable points to key assets are automatically analyzed through forward and backward reasoning of causal links. The full life cycle management of focused data assets, the deployment of a Data Leakage Protection (DLP) system, and the collection of behavior events of creation, storage, circulation, use and other links of sensitive data assets. Metadata information of the data asset is extracted through a data blood-edge analysis technology, the metadata information comprises a data source, a conversion process, a destination and the like, a traceability map of the data asset is constructed, and a complete circulation link from generation to final use of sensitive data is restored. The method comprises the steps of carrying out static analysis on database SQL sentences, ETL task scripts, program codes and the like, extracting dependency relations between tables and fields, obtaining data access and transmission behaviors through scanning weblogs and data packets, reconstructing operation time sequences of data transfer, automatically abstracting and simplifying massive blood-cause relations through a machine learning clustering algorithm, and finally revealing key paths of the data transfer. In addition, a data flow graph modeling method is utilized to formally define data exchange interfaces among different service systems, sensitive data access and sharing conditions of cross systems and cross networks are visually displayed, and abnormal flow and illegal operation behaviors are timely found. In the process of constructing the map, a rule-based reasoning engine such as Datalog or a cost-based shortest path algorithm is utilized to deeply mine implicit relations among events, assets, threats and vulnerabilities, a map embedding technology is utilized to sample a large number of node sequences on the map in a random walk mode by using models such as DeepWalk, node Vec and the like, word embedding model learning is utilized to obtain low-dimensional vector representation of each node, and further semantic similarity of vectors is utilized to describe deep relations among entities. Finally, the atlas not only can reveal the causal propagation process of each risk event in time, space and logic to form a local risk propagation subgraph, intuitively answer how the risk goes to key assets step by step, but also can globally describe the risk topology of the whole enterprise, answer which weak points and which paths have the greatest risk propagation influence. On the basis of a risk propagation map, integrating real-time service monitoring data, applying technologies such as streaming calculation, complex event processing and the like, dynamically associating occurrence of service abnormality and security risk event, deducing potential propagation paths of risks in an asset network in real time, pre-judging the influence range of the next hop, and carrying out real-time tracking and situation awareness on the propagation process of the risks. Meanwhile, deep learning models such as a graph neural network are introduced, and implicit state representations of each node are learned by applying models such as GCN on a risk propagation graph. The representation incorporates on the one hand the inherent properties and fragile features of the asset itself, and on the other hand recursively aggregates the risk states of the neighboring nodes by means of the message passing mechanism of the graph, forming a contextual representation of the asset risk. In the process of training the graph neural network, a attention mechanism can be integrated, and the weight coefficient of neighbor risk aggregation can be adaptively adjusted according to the importance degree of different asset nodes and the correlation of the Attack/dependent equal edges among the nodes. After the model is trained, the subsequent propagation trend of risks can be predicted on line according to real-time states of the assets and the network, and the high-risk propagation path can be early warned in advance. And finally, in the quantitative evaluation model, scoring risk nodes of the propagation path from multiple dimensions such as asset value, data importance, attack cost and the like, responding to high-risk propagation events in time, and providing multiple treatment decision bases such as node blocking, strategy optimization, deep defense, attack tracing and the like. In the safety data acquisition and anomaly detection, an IDS system with a Suricata open source can be deployed, port scanning behaviors in a network are detected in real time, traffic analysis tools such as Zeek and the like are adopted, communication data such as TCP, UDP, ICMP among all hosts are extracted, a MapReduce and the like distributed computing framework is used for counting traffic distribution characteristics of each IP in a period of time, a single classifier algorithm such as One-ClassSVM and the like is trained for constructing a normal behavior baseline, a neural network model such as RNN and the like is used for learning a time sequence mode of an IP communication sequence, traffic distribution at the next moment is predicted, and when JS divergence of actual distribution and predicted distribution is greater than 1.2, abnormal events are judged. In the causal analysis, a PC stability algorithm can be used, causal dependency relationship among event variables is found through condition independence test, and causal graph models are scored according to AIC, BIC and other criteria, so that an optimal causal structure is constructed. For multiple independent causal chains, they can be linked into a causal graph through a MLN markov logic network, where the first order logic in the MLN can represent an attack rule, such as "vulExists (H, V) a vulLinkHost (V, H) →exec code (H)" indicates that the host H with vulnerability V can be trapped. In asset mapping, a Nmap port scanning tool can be used, a SYN semi-connection scanning technology is adopted, all IP addresses in an enterprise network are scanned in full ports at 2000 concurrency degrees, service types, versions and equipment types of the opened ports are automatically identified by a machine learning algorithm such as a decision tree, a random forest and the like through response message fingerprints of the identified ports, and the average accuracy rate can be more than 90%. For key assets, the configuration parameters of the key assets can be obtained remotely by using protocols such as WMI, SSH and the like, patch information, account rights and the like are extracted, and CMDB configuration item attributes of the key assets are evaluated. In the attack graph construction, a depth-first search algorithm may be employed to recursively trace back the attack path from the leaf nodes of the attack graph until the root node of the attack graph (i.e., the initial attack surface) is found. For the cyclic dependency problem, a strong connected component Tarjan algorithm of the directed graph can be adopted to find all cyclic dependency clusters, and each cluster is abstracted into a super node. For each node, its risk value may be evaluated from 8 dimensions of attack complexity, attack vector, etc., using CVSS scoring criteria, multiplied by an asset importance score, and obtaining the final risk score of the node, wherein the weighted average value of the scores of all non-zero risk nodes is the risk score of the attack path. In the data tracing analysis, ETL tools such as DATASTAGE can be used for periodically scanning enterprise data warehouse, extracting the blood-edge relation of tables and fields to form blood-edge triples in the format of < source table, target table and operation type >, such as < table_A, table_B and insert > which indicate that data flows from table A to table B. And then, using a spectral clustering algorithm, and gathering the highly-relevant blood-edge relations into one type through the eigenvector decomposition of a table-table adjacency matrix, wherein each type is a key path for data stream transfer. For complex blood-lineage networks, a community discovery algorithm, such as a Louvain algorithm, can be used, the modularity of different communities is calculated firstly, the value of the modularity is between-1 and 1, communities with the value greater than 0.3 are regarded as closely related blood-lineage clusters, the centrality of each community is calculated, and the node with the highest centrality is the key data stream. In risk profile inference, a map database of gStore or other open sources may be used, and the SPARQL language is used to describe risk propagation rules. The rules are organized into a rule tree, and all implicit risk propagation relationships are deduced by recursively triggering the reasoning process on the rule tree from known facts through a backward link reasoning algorithm. For the representation learning of the atlas, a MetaPath Vec heterogeneous graph embedding model can be used for defining a meta-path of risk propagation, such as threat-vulnerability-fragile asset-sensitive data, random walk guided by the meta-path is used for generating context relations among different risk elements, skip-Gram word embedding models are used for vectorization, and finally 80-dimensional real vectors are used for representing each risk node. In the neural network of the risk tracing graph, the CVSS risk value, the asset importance and other internal attributes of each node in the graph can be encoded into the characteristic vector of the node, and the node characteristics are aggregated in a 4-hop neighborhood range through graph convolution operation, wherein an attribute mechanism is adopted in the aggregation mode. For different types of edges, such as attack, dependence, etc., different attention weights can be learned, and the information of neighbors can be aggregated respectively. During model training, taking asset nodes related to historical attack events as risk positive samples, randomly sampling equal amounts of non-attacked assets as negative samples, and optimizing model parameters through a binary cross entropy loss function to ensure that the prediction probability of the risk nodes is greater than 0.5 and the prediction probability of non-risk nodes is less than 0.5. after the model converges, the method can be used for the influence range prediction and hazard assessment of unknown risk events.
In this preferred embodiment, the present application uses this information to trace back the propagation path and scope of influence of each risk event after having obtained a risk event analysis report and identified a causal chain for each risk event. By analyzing the source of the risk, it can be inferred how the risk is spread across the network or system, and which assets are affected. Then, a risk propagation map is constructed from these propagation paths and the data of the influence range. This map visually reveals the propagation process of the risk so that the security team can intuitively see how the risk propagates from one point to another and the dependencies between them. In addition, by dynamic tracking, changes and development of risk can be monitored in real time, thereby more effectively responding to and managing risk events. Thereby helping the enterprise understand the nature of the risks and also providing a way to monitor and mitigate the effects of these risks.
The asset risk tracking and tracing method comprises the steps of obtaining network asset basic data comprising an enterprise network topological graph and an asset vulnerability risk list, drawing a system asset topological graph to identify safety risk points of key system assets, and further forming the system asset risk list. And then drawing a data asset flow chart according to the system asset topological graph, identifying the security risk of the sensitive data asset, and generating a data asset risk list. And tracking the occurrence process and the influence range of the risk event by utilizing a big data analysis technology and combining the asset risk list and the data asset risk list, and compiling a risk event analysis report. Finally, constructing a causal chain of the risk event according to the information in the risk event analysis report, tracing to a risk source, and constructing a risk propagation map, thereby providing dynamic change network safety protection for enterprises and realizing effective identification and management of enterprise asset risks.
The foregoing embodiments have been provided for the purpose of illustrating the general principles of the present invention, and are not to be construed as limiting the scope of the invention. It should be noted that any modifications, equivalent substitutions, improvements, etc. made by those skilled in the art without departing from the spirit and principles of the present invention are intended to be included in the scope of the present invention.