Disclosure of Invention
In view of this, in order to solve the problems set forth in the background art, a network space data security monitoring and analyzing method is now provided.
The aim of the invention can be achieved by the following technical scheme: the invention provides a network space data security monitoring and analyzing method, which comprises the following steps: step one: extracting source codes: after a user logs in a browser to access a page, the source code of the current access page is obtained, and then a text set and a website set in the source code are extracted.
Step two: text set analysis: analyzing a text set in the source code, wherein the text set comprises static texts and directory texts of websites, identifying whether inducible texts exist, and if the inducible texts exist, judging whether the inducible texts are the directory texts of the websites.
Step three: screening of key mark websites: when the directory text of the website is the inducible text, removing the websites corresponding to the inducible text from the website set to obtain the residual websites, analyzing the suspicious degree of the residual websites to obtain the abnormal websites and the key marked websites, and shielding the abnormal websites.
Step four: safety monitoring: after a user enters a page corresponding to a certain key mark website, counting the number of the key mark websites entering the page, and if the number exceeds a set value, executing closing operation on the entering page.
Step five: and (3) setting an indefinite time: and acquiring a system security log of the browser, and setting an irregular extraction rule according to the access rule of the user in the browser.
Step six: and (3) log analysis: and carrying out untimely analysis on the system security log, detecting an abnormal event in the system security log, and judging whether autonomous searching and killing operation is required to be executed on the browser.
Specifically, the analyzing the text set in the source code includes: the HTML parsing library is used to convert the source code into an operable document object model through which all text content and hyperlink elements, i.e., text sets and web site sets in the source code, are located.
Constructing an inducible text keyword library, identifying each inducible text in a text set by a keyword filtering method, counting the proportion of the inducible text in the text set, comparing the proportion with a set proportion threshold, closing the current webpage when the proportion of the inducible text in the text set exceeds the set proportion threshold, and otherwise judging whether each inducible text is a directory text of a website or not.
The method comprises the specific steps of acquiring the position of an inducible text in a source code, further identifying whether the corresponding position of the source code has a website by using a URL analysis function, if not, the inducible text is a static text, and further executing marking and shielding operations on the static text; if the website exists, the induced text is the directory text of the website, and the positioning shielding is carried out on the website.
Specifically, the suspicious degree procedure for analyzing each remaining website is as follows: c1, acquiring the protocol of each residual website, if a certain residual website protocol is HTTPS, acquiring the content of the website protocol certificate, further verifying the compliance epsilon of the residual website protocol certificate, otherwise, marking the compliance epsilon of the residual website protocol certificate as 1, and obtaining the compliance epsilon of each residual website protocol certificatek =ε or 1, k is the remaining web site number, k=1, 2,..u.
C2, simulating user behaviors by using an automatic testing tool, obtaining multiple redirection behaviors of each residual website, obtaining the redirection times and paths of each residual website, and calculating the suspicious degree of the redirection paths of each residual websiteWherein ρ is0 For the set suspicious degree adjustment coefficient, M is the number of redirection times of the residual website, M' is the set threshold value of the number of redirection times, ρk ' redirect path validity for the kth remaining web site affects the weight.
C3, calculating the suspicious degree of each residual websiteWherein, ρ ', ε' are respectively the setting reference values of the suspicious degree of the redirection path and the compliance degree of the protocol certificate, λ1 and λ2 are respectively the setting duty ratios corresponding to the suspicious degree of the redirection path and the compliance degree of the protocol certificate, and e is a natural constant.
Specifically, the analysis mode of the validity influence weight of the residual website redirection path is as follows: obtaining each path target URL of the redirection of the residual website, simulating each path target URL of the redirection by using a network monitoring tool, obtaining returned response content through an interface provided by the tool, and storing the response content as various variable indexes, wherein the variable indexes comprise IP addresses corresponding to website domain names, HTTP status codes of the websites and URL return content.
And according to the expected content design verification rule, analyzing each variable index and verifying to obtain a verification result of each variable index, wherein the verification result comprises valid and invalid.
If the verification result of a certain variable index is effective, marking the influence weight corresponding to the variable index as 1, otherwise marking the influence weight as 0, adding the influence weights corresponding to the variable indexes to obtain the comprehensive influence weight of the paths, and further adding the comprehensive influence weights of the paths to obtain the website redirection path effectiveness influence weight.
Specifically, the extraction mode of the key mark website is as follows: setting a suspicious threshold range, if the suspicious degree of a certain residual website is smaller than the minimum value of the suspicious threshold range, marking the residual website as an abnormal website, and carrying out interception shielding on the abnormal website; if the suspicion degree of a certain residual website is within the suspicion degree threshold, the residual website is marked as a key mark website.
Specifically, the method for setting the irregular extraction rule is as follows: f1, determining an initial time range, generating a random time point in the initial time range by using a random function, and taking the random time point as a starting time point t of timing task executionStarting from the beginning 。
F2, extracting the peak access time in the initial time range in the system security log at the end time of the initial time range, and marking as tPeak to peak At t1=tPeak to peak +|tPeak to peak -tStarting from the beginning I as the first indefinite moment.
F3, will tPeak to peak T1 is used as the starting access time and the ending access time of the next time range, and the peak access time t in the next time range in the system security log is extracted at the ending access time of the next time rangePeak to peak ' and further, t2=t1+ (tPeak to peak '-tPeak to peak ) As a second irregular time, according to the irregular time setting rule, the irregular time is set during the user access.
Specifically, the performing the untimely analysis on the corresponding content of the security log includes: and acquiring a system security log according to a time point in the irregular extraction rule, extracting a downloading event and an abnormal login event in the current time range from the system security log, analyzing an abnormal coefficient of the downloading event and an abnormal coefficient of the abnormal login event, and marking the abnormal coefficient and the abnormal coefficient as delta 1 and delta 2.
The system security log is evaluated for an abnormal event impact factor delta,tau is the correction factor of the abnormal event influence coefficient, when delta is larger than delta0 When the browser judges that the browser needs to execute the autonomous searching and killing operation, delta0 The set abnormal event influence coefficient threshold value is indicated.
Specifically, the steps of analyzing the anomaly coefficient of the download event are as follows: identifying whether the downloading event is the autonomous downloading action of the user, if the downloading event is the webpage downloading action, acquiring a downloading source website, pre-warning the downloading source website, and carrying out psi0 As an anomaly coefficient for the download event.
If the download event is the autonomous download behavior of the user, extracting the webpage upload data of the download file corresponding to the download event and the download data of the user side, and comparing and calculating the health index of the download file package of the download eventWherein s1 and s2 are respectively the download start time and download end time in the user download data, B represents the download file size, v represents the normal download time corresponding to the set unit file size, Δv represents the set download speed error allowable value, ζ represents the anomaly corresponding to the download file size in the user download data, and ψ is further taken as the anomaly coefficient of the download event, so the anomaly coefficient of the download event is δ1=ψ or ψ0 。
Specifically, the step of analyzing the anomaly coefficient of the anomaly log-in event comprises the following steps: recording each abnormal login event before the access starting time corresponding to the current time range as each historical login event, acquiring login information of the abnormal login event corresponding to the current time point, comparing the login information with the login information of each historical login event, calculating login address evaluation coefficients of each historical login event, counting the number of the historical login events exceeding a threshold value of the set login address evaluation coefficients, and recording as Y.
Extracting login equipment from login information corresponding to an abnormal login event at the current time point, comparing the login equipment with common equipment of a user, calculating an abnormal coefficient delta 2 of the abnormal login event,wherein Y' represents the number of historical login events, sigma represents the setting deviation correction factor corresponding to the abnormality coefficient of the abnormal login event, U represents the setting constant larger than 2, P represents that login equipment can be matched with user common equipment, beta 1 represents the influence weight of the login equipment set in the P state, and beta 2 represents->The login device set in the state affects the weight.
Compared with the prior art, the invention has the following beneficial effects: (1) According to the method and the system for identifying the web site, the hidden suspicious degree of the web site path in the page is judged by analyzing the web site path, and then the web site with the suspicious degree exceeding the threshold value of the path is subjected to key marking, when a user accesses the key marked web site, the web site content is identified again, the user can be more alert when accessing, the user can be helped to better identify and avoid the potential risk web site, so that the safety and the trust degree of the user are improved, the web site content can be identified again, more accurate, real and useful information can be provided for the user, and the browsing experience of the user is optimized.
(2) According to the invention, the irregular searching and killing judgment is carried out on the browser through the irregular setting rule, so that the content of the browser can be updated in time, malicious software is found and cleared, and the spread of the malicious software in the browser is prevented, thereby reducing the risk of network attack of a user.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Referring to fig. 1, the present invention provides a network space data security monitoring and analyzing method, which includes: step one: extracting source codes: after a user logs in a browser to access a page, the source code of the current access page is obtained, and then a text set and a website set in the source code are extracted.
Step two: text set analysis: analyzing a text set in the source code, wherein the text set comprises static texts and directory texts of websites, identifying whether inducible texts exist, and if the inducible texts exist, judging whether the inducible texts are the directory texts of the websites.
In a preferred embodiment, the analyzing the text set in the source code, analyzing the content includes: the HTML parsing library is used to convert the source code into an operable document object model through which all text content and hyperlink elements, i.e., text sets and web site sets in the source code, are located.
Constructing an inducible text keyword library, identifying each inducible text in a text set by a keyword filtering method, counting the proportion of the inducible text in the text set, comparing the proportion with a set proportion threshold, closing the current webpage when the proportion of the inducible text in the text set exceeds the set proportion threshold, and otherwise judging whether each inducible text is a directory text of a website or not.
Specifically, the keyword filtering method comprises the following steps: and comparing each text in the text set with each keyword in the inducible text keyword library, and if a certain text in the text set can be matched with a certain keyword in the inducible text keyword library, determining that the text in the text set is the inducible text.
In a preferred embodiment, the specific step of determining whether the inducible text is a directory text of a website includes the steps of obtaining a position of the inducible text in a source code, further identifying whether the website exists at a corresponding position of the source code by using a URL parsing function, if the website does not exist, determining that the inducible text is a static text, and further performing marking and shielding operations on the static text; if the website exists, the induced text is the directory text of the website, and the positioning shielding is carried out on the website.
Step three: screening of key mark websites: when the directory text of the website is the inducible text, removing the websites corresponding to the inducible text from the website set to obtain the residual websites, analyzing the suspicious degree of the residual websites to obtain the abnormal websites and the key marked websites, and shielding the abnormal websites.
In a preferred embodiment, the suspicious procedure for analyzing each remaining web site is as follows: c1, acquiring a protocol of each residual website, if a certain residual website protocol is HTTPS, acquiring the content of the website protocol certificate, wherein the content of the protocol certificate comprises an issuer, a domain name and an expiration date, further verifying the compliance epsilon of the residual website protocol certificate, otherwise, marking the compliance epsilon of the residual website protocol certificate as 1, and obtaining the compliance of each residual website protocol certificateDegree of regularity epsilonk =ε or 1, k is the remaining web site number, k=1, 2,..u.
The compliance verification method of the website protocol certificate is as follows: checking an issuer field in a website protocol certificate, confirming whether the website protocol certificate is issued by a set certificate issuing mechanism, if the issuer mechanism can be identified, indicating that the issuer content of the website protocol certificate meets the requirement, further checking a domain name field in the website protocol certificate, matching the domain name field with a currently queried page website domain name, if the website domain name can be successfully matched, indicating that the domain name content of the website protocol certificate meets the requirement, further comparing the expiration date of the website protocol certificate with the current date, and if the expiration date of the website protocol certificate is after the current date, indicating that the date content of the website protocol certificate meets the requirement, and further taking epsilon 1 as the compliance of the website protocol certificate; if the above-mentioned checking step has an unsatisfactory content, then epsilon 2 is used as the compliance degree of the website protocol certificate, so when the website protocol is HTTPS, the compliance degree epsilon=epsilon 1 or epsilon 2 of the website protocol certificate is obtained.
C2, simulating user behaviors by using an automatic testing tool, obtaining multiple redirection behaviors of each residual website, obtaining the redirection times and paths of each residual website, and calculating the suspicious degree of the redirection paths of each residual websiteWherein ρ is0 For the set suspicious degree adjustment coefficient, M is the number of redirection times of the residual website, M' is the set threshold value of the number of redirection times, ρk ' redirect path validity for the kth remaining web site affects the weight.
C3, calculating the suspicious degree of each residual websiteWherein, ρ ', ε' are respectively the setting reference values of the suspicious degree of the redirection path and the compliance degree of the protocol certificate, λ1 and λ2 are respectively the setting duty ratios corresponding to the suspicious degree of the redirection path and the compliance degree of the protocol certificate, and e is a natural constant.
In a preferred embodiment, the remaining website redirection path validity impact weight analysis method is as follows: obtaining each path target URL of the redirection of the residual website, simulating each path target URL of the redirection by using a network monitoring tool, obtaining returned response content through an interface provided by the tool, and storing the response content as various variable indexes, wherein the variable indexes comprise IP addresses corresponding to website domain names, HTTP status codes of the websites and URL return content.
And the target URL of each path of the residual website redirection is obtained by using a browser development tool.
And according to the expected content design verification rule, analyzing each variable index and verifying to obtain a verification result of each variable index, wherein the verification result comprises valid and invalid.
If the verification result of a certain variable index is effective, marking the influence weight corresponding to the variable index as 1, otherwise marking the influence weight as 0, adding the influence weights corresponding to the variable indexes to obtain the comprehensive influence weight of the paths, and further adding the comprehensive influence weights of the paths to obtain the website redirection path effectiveness influence weight.
The verification result of each variable index corresponds to analysis content and includes: and E1, before accessing the URL, analyzing the domain name of the URL into an IP address, and if the IP address corresponding to the URL cannot be analyzed, considering that the URL is invalid.
E2, acquiring the HTTP status code of the website, and analyzing whether the HTTP status code of the website is effective.
Exemplary, common HTTP status codes include 200, 404, 500, etc., where 200 indicates that the request was successful, 404 indicates that the page does not exist, 500 indicates that the server is in error, etc., and if the returned status code is 200, the URL is considered valid, and further the URL connection duration is obtained, and if the connection duration exceeds the set duration, it indicates that the connection cannot be established, and the URL is invalid.
And E3, acquiring the returned content of the URL, judging whether the returned content meets the expectation, and if not, invalidating the URL. For example, if an HTML page is desired to be returned, but an error message or other type of content is actually returned, then the URL is deemed invalid.
In a preferred embodiment, the extraction method of the key mark website is as follows: setting a suspicious threshold range, if the suspicious degree of a certain residual website is smaller than the minimum value of the suspicious threshold range, marking the residual website as an abnormal website, and carrying out interception shielding on the abnormal website; if the suspicion degree of a certain residual website is within the suspicion degree threshold, the residual website is marked as a key mark website.
Step four: safety monitoring: after a user enters a page corresponding to a certain key mark website, counting the number of the key mark websites entering the page, and if the number exceeds a set value, executing closing operation on the entering page.
According to the method and the system for identifying the web site, the hidden suspicious degree of the web site path in the page is judged by analyzing the web site path, and then the web site with the suspicious degree exceeding the threshold value of the path is subjected to key marking, when a user accesses the key marked web site, the web site content is identified again, the user can be more alert when accessing, the user can be helped to better identify and avoid the potential risk web site, so that the safety and the trust degree of the user are improved, the web site content can be identified again, more accurate, real and useful information can be provided for the user, and the browsing experience of the user is optimized.
Step five: and (3) setting an indefinite time: and acquiring a system security log of the browser, and setting an irregular extraction rule according to the access rule of the user in the browser.
In a preferred embodiment, the method for setting the irregular extraction rule is as follows: f1, determining an initial time range, generating a random time point in the initial time range by using a random function, and taking the random time point as a starting time point t of timing task executionStarting from the beginning 。
F2, extracting the peak access time in the initial time range in the system security log at the end time of the initial time range, and marking as tPeak to peak At t1=tPeak to peak +|tPeak to peak -tStarting from the beginning I as the first indefinite moment.
F3, will tPeak to peak T1 asThe starting access time and the ending access time of the next time range, and the peak access time t in the next time range in the system security log is extracted at the ending access time of the next time rangePeak to peak ' and further, t2=t1+ (tPeak to peak '-tPeak to peak ) As a second irregular time, according to the irregular time setting rule, the irregular time is set during the user access.
The peak access time acquisition mode is as follows: the access amount of each time point in the initial time range is extracted from the access log, the access amount of each time point in the initial time range is compared with the preset access amount, when the access amount of a certain time point in the initial time range is larger than the preset access amount, the time point is marked as initial time, the access amount of each time point corresponding to the subsequent time point at the time point is compared with the preset access amount in sequence, and the time point with the access amount smaller than the preset access amount is positioned as terminal time.
Taking the interval duration between the initial time and the terminal time as a sub-time period, further obtaining each sub-time period in the initial time range, comparing the access quantity corresponding to the central time of each sub-time period with each other, screening out the maximum access quantity, and further recording the central time of the sub-time period corresponding to the maximum access quantity as the peak access time. When tPeak to peak =tPeak to peak And when' the access quantity is acquired and arranged in the sub-time period corresponding to the center time of the second bit, and the central time is recorded as the peak access time.
The center time of the sub-time period is the corresponding time of the middle time point of the sub-time period.
The access amount refers to the access behavior of the user to the website, wherein the access behavior comprises equipment information, IP address, accessed page or resource of the user and the like. In the peak access time period, activities are frequent and more potential risks exist, timely monitoring and response are very important, and therefore, when an irregular time interval is set, the peak access time is preferentially selected for extracting the security log.
Step six: and (3) log analysis: and carrying out untimely analysis on the system security log, detecting an abnormal event in the system security log, and judging whether autonomous searching and killing operation is required to be executed on the browser.
In a preferred embodiment, the performing the untimely analysis on the corresponding content of the security log includes: and acquiring a system security log according to a time point in the irregular extraction rule, extracting a downloading event and an abnormal login event in the current time range from the system security log, analyzing an abnormal coefficient of the downloading event and an abnormal coefficient of the abnormal login event, and marking the abnormal coefficient and the abnormal coefficient as delta 1 and delta 2.
The system security log is evaluated for an abnormal event impact factor delta,tau is the correction factor of the abnormal event influence coefficient, when delta is larger than delta0 When the browser judges that the browser needs to execute the autonomous searching and killing operation, delta0 The set abnormal event influence coefficient threshold value is indicated.
In a preferred embodiment, the analysis of anomaly coefficients of download events comprises the steps of: identifying whether the downloading event is the autonomous downloading action of the user, if the downloading event is the webpage downloading action, acquiring a downloading source website, pre-warning the downloading source website, and carrying out psi0 As an anomaly coefficient for the download event.
Specifically, by analyzing the operation flow of the user in the website, it can be determined whether the download event is associated with the current operation of the user. For example, if the user triggers a download event after clicking a button, then the download event is determined to be an autonomous download action by the user.
If the download event is the autonomous download behavior of the user, extracting the webpage upload data of the download file corresponding to the download event and the download data of the user side, and comparing and calculating the health index of the download file package of the download eventWherein s1 and s2 are respectively the download start time and download end time in the user download data, B represents the download file size, v represents the normal download time length corresponding to the set unit file size, and Deltav represents the set unit file sizeThe allowable value of the download speed error, ζ, represents the degree of abnormality corresponding to the size of the downloaded file in the user download data, and then use ψ as the abnormality coefficient of the download event, so the abnormality coefficient of the download event is δ1=ψ or ψ0 。
The webpage uploading data of the downloaded file is the file uploading size, and the user side downloading data comprises the downloaded file size, the downloading starting time and the downloading ending time.
The abnormality degree obtaining mode corresponding to the size of the downloaded file in the user downloaded data is as follows: the uploading size of the file is marked as B ', if B is more than B', binding files exist in the downloaded file, the anomaly degree corresponding to the size of the downloaded file in the downloaded data of a user is marked as ζ1, the content of a downloaded file packet is analyzed by using a security analysis tool, the existing hidden file is identified, the position of the hidden file is tracked by searching a registry item, a file path and a process, and then the hidden file is automatically deleted; if B is less than B', the downloaded file is in shortage, the degree of abnormality corresponding to the size of the downloaded file in the user downloaded data is marked as zeta 2, and early warning information is sent to a background system of the file downloading end, so that the degree of abnormality zeta=zeta 1 or zeta 2 corresponding to the size of the downloaded file in the user downloaded data is obtained.
In a preferred embodiment, the analysis of anomaly coefficients of an anomaly log event comprises the following steps: recording each abnormal login event before the access starting time corresponding to the current time range as each historical login event, acquiring login information of the abnormal login event corresponding to the current time point, comparing the login information with the login information of each historical login event, calculating login address evaluation coefficients of each historical login event, counting the number of the historical login events exceeding a threshold value of the set login address evaluation coefficients, and recording as Y.
Extracting login equipment from login information corresponding to an abnormal login event at the current time point, comparing the login equipment with common equipment of a user, calculating an abnormal coefficient delta 2 of the abnormal login event,wherein Y' represents a history logRecording event number sigma represents an abnormal coefficient corresponding set deviation correction factor of an abnormal logging event, U represents a set constant larger than 2, P represents that logging equipment can be matched with user common equipment, beta 1 represents logging equipment influence weight set in a P state, and beta 2 represents->The login device set in the state affects the weight.
It should be noted that, the login address evaluation coefficient calculation mode of the history login event is as follows: extracting a historical login place from login information of a historical login event, extracting a current login place from login information of an abnormal login event corresponding to a current time point, comparing the historical login place with the current login place to obtain a geographic position distance L, and obtaining a login address evaluation coefficient of the historical login event by a calculation formulaAnd E2, wherein E1 represents that the historic login location is different from the country corresponding to the current login location, and E2 represents that the historic login location is the same as the country corresponding to the current login location.
The user commonly used equipment is equipment for inputting a short message authentication password by a user.
According to the invention, the irregular searching and killing judgment is carried out on the browser through the irregular setting rule, so that the content of the browser can be updated in time, malicious software is found and cleared, and the spread of the malicious software in the browser is prevented, thereby reducing the risk of network attack of a user.
The foregoing is merely illustrative and explanatory of the principles of this invention, as various modifications and additions may be made to the specific embodiments described, or similar arrangements may be substituted by those skilled in the art, without departing from the principles of this invention or beyond the scope of this invention as defined in the claims.