Disclosure of Invention
The invention provides a website security vulnerability detection method and system, which are used for promoting and optimizing the security detection function of a website.
In order to solve the above technical problems, an embodiment of the present invention provides a website security vulnerability detection method, including:
and acquiring the to-be-processed Webshell file, and dividing codes in the to-be-processed Webshell file into a plurality of code segments according to the coding rule of the to-be-processed Webshell file.
And screening a plurality of interactable code blocks from the acquired page codes in the target website, and inserting the interactable code blocks between the code segments to obtain a first Webshell file, wherein the insertion process is configured to interact the selected code segments with the corresponding interactable code blocks in a coupling mode.
And screening the first Webshell file based on a character string feature matching rule, and rewriting each risk character string obtained by screening to obtain a second Webshell file.
And hiding the variable transfer process in the second Webshell file by adopting a data flow diagram confusion method to obtain a third Webshell file.
And interacting the third Webshell file with a selected detection tool, correcting the detection tool based on an interaction result, and detecting the security hole of the target website by the corrected detection tool.
Further, the dividing the codes in the to-be-processed Webshell file into a plurality of code segments according to the coding rule of the to-be-processed Webshell file includes:
Extracting code data in the Webshell file to be processed, and matching specified codes in the code data by using a regular expression.
And dividing the code data according to the matching result to obtain a plurality of code segments.
Further, the dividing the code data according to the matching result, after obtaining a plurality of code segments, further includes:
and carrying out logic verification on each divided code segment, and correcting the division result of each code segment according to the verification result.
Further, the inserting each of the interactable code blocks between each of the code segments comprises:
And determining an interaction interface according to the function and the logic relation of each interactable code block.
And analyzing the data information which needs to be exchanged and transferred by each code segment.
And establishing a matching relation between each code segment and each interaction interface according to the data information.
And coupling each code segment with each interactable code block according to the matching relation.
Further, the character string feature matching rules include word matching rules, sentence matching rules and frequency detection rules.
The screening of the first Webshell file based on the character string feature matching rule, and the rewriting of each risk character string obtained by screening, includes:
and screening system functions in the risk character string from the first Webshell file, and dynamically rewriting function names of the system functions through character string operation.
And screening nested functions in the risk strings from the first Webshell file, independently providing the embedded functions in the nested functions, and calling the embedded functions through new variables.
And recoding the risk character string in the first Webshell file in an encryption mapping mode.
Further, the recoding the risk character string in the first Webshell file by means of encryption mapping includes:
and encrypting all the risk character strings in the first Webshell file by using an encryption mapping rule, and transcoding all the risk character strings after the encryption.
Further, the coding conversion rules comprise ASCII coding rules, unicode coding, GBK/GB2312 coding and Base64 coding rules.
Further, hiding the variable transfer process in the second Webshell file by adopting a dataflow graph confusion method includes:
and performing assignment transformation on the variables in the second Webshell file by using judgment sentences to realize hiding of the explicit assignment sentences.
And writing part of variable information into file attribute information of the second Webshell file in a file steganography mode.
Further, the writing the partial variable information into the file attribute information of the second Webshell file by means of file steganography includes:
Variable information is written as hidden data into the spare data stream of the Webshell by utilizing the spare data stream characteristic of the NTFS file system.
Another embodiment of the present invention provides a website security vulnerability detection system, including:
The code segmentation module is used for acquiring the to-be-processed Webshell file and segmenting codes in the to-be-processed Webshell file into a plurality of code segments according to the coding rule of the to-be-processed Webshell file.
The code interaction module is used for screening a plurality of interactable code blocks from the acquired page codes in the target website, inserting the interactable code blocks between the code segments to obtain a first Webshell file, wherein the insertion process is configured to interact the selected code segments with the corresponding interactable code blocks in a coupling mode.
And the character string rewriting module is used for screening the first Webshell file based on a character string characteristic matching rule, and rewriting each risk character string obtained by screening to obtain a second Webshell file.
And the variable hiding module is used for hiding the variable transmission process in the second Webshell file by adopting a data flow diagram confusion method to obtain a third Webshell file.
And the vulnerability detection module is used for interacting the third Webshell file with a selected detection tool, correcting the detection tool based on an interaction result, and detecting the security vulnerability of the target website by the corrected detection tool.
Compared with the prior art, the embodiment of the invention has the beneficial effects that at least one of the following points is adopted:
(1) Considering the thinness of the existing Webshell detection technology, the invention comprehensively bypasses the static detection of the Webshell in the prior art by designing the detection method based on the Optode/Bytecode, the detection method based on the character string matching, the detection method based on the CFG and the detection method based on the DFG and Taint.
(2) By analyzing strategies and technologies of Webshell attacks, the mechanism and the limitation of the Webshell detection tool are deeply analyzed, so that a basis is provided for developing more effective detection and defense strategies, and finally direction guidance is made for optimization and improvement of the static detection tool.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are only some, but not all embodiments of the invention, and the purpose of these embodiments is to provide a more thorough and complete disclosure of the present invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
In the description of the present application, the terms "first," "second," "third," and the like are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defining "a first", "a second", "a third", etc. may explicitly or implicitly include one or more such feature. In the description of the present application, unless otherwise indicated, the meaning of "a plurality" is two or more.
In the description of the present application, unless explicitly stated or limited otherwise, the terms "mounted," "connected," and "connected" are to be construed broadly, and may be, for example, fixedly connected, detachably connected, or integrally connected, mechanically connected, electrically connected, directly connected, indirectly connected via an intervening medium, or in communication between two elements. The terms "vertical," "horizontal," "left," "right," "upper," "lower," and the like are used herein for descriptive purposes only and not to indicate or imply that the apparatus or element being referred to must have a particular orientation, be constructed and operated in a particular orientation, and therefore should not be construed as limiting the application. The term "and/or" as used herein includes any and all combinations of one or more of the associated listed items. The specific meaning of the above terms in the present application will be understood in specific cases by those of ordinary skill in the art.
In the description of the present application, it should be noted that all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art unless defined otherwise. The terminology used in the description of the present application is for the purpose of describing particular embodiments only and is not intended to be limiting of the application, as the particular meaning of the terms described above in the present application will be understood to those of ordinary skill in the art in the detailed description of the application.
An embodiment of the present invention provides a website security breach detection method, specifically, referring to fig. 1, fig. 1 is a flowchart showing steps of the website security breach detection method in one embodiment of the present invention, including steps S11 to S15:
And S11, acquiring a to-be-processed Webshell file, and dividing codes in the to-be-processed Webshell file into a plurality of code segments according to coding rules of the to-be-processed Webshell file.
The code segmentation process comprises the steps of extracting code data in the Webshell file to be processed, matching specified codes (such as function definition, conditional statement, circulation statement and the like) in the code data by using a regular expression, and positioning key parts in the codes according to a matching result so as to segment the code data and obtain a plurality of code segments.
In addition, code segmentation can be realized through a static code analysis tool, the static code analysis tool is adopted to scan and analyze the Webshell file to be processed, code logic, function call, variable statement and the like in the file are identified, and code segments are divided according to the code logic, the function call, the variable statement and the like.
After the codes are divided, in order to verify the logic rationality of the divided codes, logic verification is carried out on each divided code segment, and the division result of each code segment is corrected according to the verification result so as to ensure that each code segment can correctly express the logic and the functions in the original Webshell file.
Step S12, a plurality of interactable code blocks are screened from the acquired page codes in the target website, and the interactable code blocks are inserted between the code segments to obtain a first Webshell file, wherein the insertion process is configured to interact the selected code segments with the corresponding interactable code blocks in a coupling mode.
The embodiment is mainly aimed at processing Webshell files by using a detection method based on Opcode/Bytecode, a detection method based on CFG (control flow graph), a detection method based on character string matching and a detection method based on DFG (data flow graph). For the detection method based on the Opcode/Bytecode, the processing mode adopted in this embodiment is that the Webshell file code is inserted into the page code of the target website, or the page code of the target website is inserted into the Webshell file code, for example, the page code of the target website is inserted into the Webshell file code, and the specific process is that a plurality of interactable code blocks are screened out from the acquired page code of the target website, and each interactable code block is inserted between each code segment of the Webshell file to be processed.
However, although this method may bypass detection on the level of Opcode, the inserted code and the code block in the original Webshell file to be processed may not have semantic association, and the two codes that do not have interaction may be separated by CFG (control flow graph) or the like and then detected by a detection tool, so in the CFG-based detection method, in the process of inserting the code, the inserted code and the original code need to be interacted in a coupling manner under the condition that the original control flow is not affected, so as to further improve the confusion difficulty, and the specific process adopted in this embodiment is as follows:
And determining an interaction interface according to the function and the logic relation of each interactable code block, analyzing the data information to be exchanged and transferred by each code segment, establishing a matching relation between each code segment and each interaction interface according to the data information, and coupling each code segment and each interactable code block according to the matching relation.
In addition, to further obfuscate the code and prevent the detected tool from screening out abnormal codes, the embodiment may also use a method of introducing a random algorithm to generate a non-deterministic input to interfere with the simplification and detection of the code by the CFG. Specifically, the method adopted in this embodiment is as follows:
And adding a random character string in the output part of the program, wherein the character string can be accessed for multiple times through a remote tool in the subsequent Webshell execution process, and determining the real output result of the program according to the unchanged part in the access result by comparing the results of multiple accesses. For example, after the Webshell program successfully bypasses the initial detection and starts to execute, if the result obtained by accessing a certain code segment for multiple times is the content of character strings "resultlyr", "result123", "resultexc", etc., after a certain number of character strings are obtained, the actual result of executing the code segment can be obtained by comparing the obtained result to be the character string "result", and the subsequent character string is the character string randomly added to the actual result by using a random algorithm.
And step S13, screening the first Webshell file based on a character string feature matching rule, and rewriting each risk character string obtained by screening to obtain a second Webshell file.
The character string matching-based detection method adopts character string feature matching rules to rewrite the screened risk character strings so as to bypass detection, wherein the character string feature matching rules comprise word matching rules, sentence matching rules, frequency detection rules and the like.
The method for rewriting each risk character string obtained by screening comprises the following steps:
And screening system functions in the risk character string from the first Webshell file, and dynamically rewriting function names of the system functions through character string operation.
And screening nested functions in the risk character string from the first Webshell file, independently providing the embedded functions in each nested function, and calling the embedded functions through the new variable.
And recoding the risk character strings in the first Webshell file in a mode of encryption mapping. The specific process includes that encryption mapping rules are used for carrying out encryption processing on each risk character string in the first Webshell file, and encoding rules such as ASCII encoding rules, unicode encoding, GBK/GB2312 encoding and Base64 encoding rules are used for carrying out encoding conversion on each risk character string after the encryption processing.
Furthermore, the risk string may also be rewritten to bypass string-match based detection by:
And hiding global variables, namely avoiding detection of related variable names, dynamically generating variable names through string operation, or packaging original variables by defining new variables.
Renaming the function parameters, namely avoiding the detection of the related function and variable combination, proposing the original function parameters, assigning the original function parameters to new variables, and carrying out parameter transmission by using the new variables.
Limiting the operation quantity of single-line codes, namely avoiding the detection of related frequencies and splitting the overlong single-line codes in the AST.
The adjustment of the position of each line of program in the code, i.e. the avoidance of the detection of the relevant position, can be achieved by interleaving normal codes between each code.
And S14, hiding the variable transfer process in the second Webshell file by adopting a data flow diagram confusion method to obtain a third Webshell file.
Some detection tools use Taint technology to track user control input to form a data flow graph DFG, and detect sink points in the graph, so that input sources and outputs of data are analyzed, and detection of Webshells is achieved.
For the detection method based on the DFG (data flow graph), if, while, for and other judgment sentences can be utilized to carry out assignment transformation on the variables in the second Webshell file so as to realize hiding of the explicit assignment sentence and further avoid detection.
For some variable data with smaller occupied memory, the spare data stream characteristic of the NTFS file system can be utilized, and the part of variable information is written into the file attribute information, the CPU cache or other spare data streams of the second Webshell file as hidden data in a file steganography mode, so that detection is avoided.
And S15, interacting the third Webshell file with a selected detection tool, correcting the detection tool based on an interaction result, and detecting the security vulnerabilities of the target website by the corrected detection tool.
The website security vulnerability detection method considers the thinness of the existing Webshell detection technology, and rewrites the Webshell codes by designing the detection method based on the Optode/Bytecode, the detection method based on the character string matching, the detection method based on the CFG and the detection method based on the DFG and Taint, thereby comprehensively bypassing the static detection of the Webshell in the prior art. By analyzing strategies and technologies of Webshell attacks, the mechanism and the limitation of the Webshell detection tool are deeply analyzed, so that a basis is provided for developing more effective detection and defense strategies, and finally direction guidance is made for optimization and improvement of the static detection tool.
The embodiment of the invention also provides a website security hole detection system for executing the website security hole detection system, and fig. 2 is a structural block diagram of the website security hole detection system according to the embodiment of the invention, wherein the system comprises:
The code segmentation module 21 is configured to obtain a to-be-processed Webshell file, and segment a code in the to-be-processed Webshell file into a plurality of code segments according to a coding rule of the to-be-processed Webshell file.
The code interaction module 22 is configured to screen a plurality of interactable code blocks from the acquired page codes in the target website, insert each interactable code block between each code segment to obtain a first Webshell file, where the insertion process is configured to interact the selected code segment with each corresponding interactable code block in a coupling manner.
And the character string rewriting module 23 is configured to screen the first Webshell file based on a character string feature matching rule, and rewrite each risk character string obtained by screening to obtain a second Webshell file.
And the variable hiding module 24 is configured to hide the variable transmission process in the second Webshell file by using a dataflow graph confusion method, so as to obtain a third Webshell file.
And the vulnerability detection module 25 is configured to interact the third Webshell file with a selected detection tool, correct the detection tool based on an interaction result, and detect the security vulnerability of the target website by using the corrected detection tool.
The technical features and technical effects of the system provided by the embodiment of the present invention are the same as those of the method provided by the embodiment of the present invention, and are not described herein. The various modules in the system described above may be implemented in whole or in part by software, hardware, and combinations thereof. The above modules may be embedded in hardware or may be independent of a processor in the computer device, or may be stored in software in a memory in the computer device, so that the processor may call and execute operations corresponding to the above modules.
The foregoing examples illustrate only a few embodiments of the invention and are described in detail herein without thereby limiting the scope of the invention. It should be noted that it will be apparent to those skilled in the art that several variations and modifications can be made without departing from the spirit of the invention, which are all within the scope of the invention. Accordingly, the scope of protection of the present invention is to be determined by the appended claims.