Disclosure of Invention
In order to solve the above problems, the present invention provides a method for detecting a security vulnerability of a software source code on one hand, and provides a device for detecting a security vulnerability of a software source code on the other hand, so as to be able to effectively detect a security vulnerability existing in a software source code.
The method for detecting the security vulnerability of the software source code comprises the following steps:
establishing an abstract syntax tree AST corresponding to a source code of software to be detected;
determining a manipulatable point and a risk point among nodes of the established AST according to predefined manipulatable points and risk points;
searching an execution path from a controllable point to a risk point in the AST, and determining the execution path as a potential risk execution path which may cause a security vulnerability if the risk point on the execution path can be controlled by the controllable point on the execution path.
The method according to the invention, wherein said steerable points are input class functions; the risk points are execution class functions and/or assignment statements.
The method according to the invention, wherein the manipulatable parameter in the manipulatable point and the risk parameter in the risk point are further predefined.
The method according to the invention, wherein the manner of determining that the risk points on the execution path can be controlled by the manipulable points on the execution path is:
determining that a risk point on the execution path is controllable by a manipulable point on the execution path if a risk parameter in a risk point on the execution path is contaminated by a manipulable parameter in a manipulable point on the execution path.
The method according to the invention, wherein the manner of determining that the risk parameters in the risk points on the execution path are contaminated by the manipulatable parameters in the manipulatable points on the execution path is:
taking a manipulated parameter in a manipulatable point on the execution path as an initial potentially manipulated variable PEV;
determining an intermediate PEV on the execution path that is contaminated by the initial PEV;
and judging whether the risk parameters in the risk points on the execution path are the initial PEV or the intermediate PEV, and if so, judging that the risk parameters in the risk points on the execution path are polluted by the controllable parameters in the controllable points on the execution path.
According to the method, the mode for acquiring the intermediate PEV polluted by the initial PEV on the execution path is a data flow analysis mode.
According to the method of the present invention, the means for obtaining intermediate PEVs on the execution path that are contaminated by the initial PEV further comprises a control flow analysis means.
The method according to the invention, wherein the method further comprises: generating a test report according to the determined potential risk execution path.
The method according to the present invention, wherein said generating a test report according to the determined risk potential execution path comprises:
and if the potential risk execution path can be directly represented by test input and test conditions, generating a test script according to the test input and the test conditions, otherwise, generating a test report for recording the potential risk execution path.
According to the method, the source code of the software to be detected is a high-level programming language source code or an assembly language source code obtained by decompiling executable program codes.
The detection device for the software source code security vulnerability, provided by the invention, comprises a source code processing unit and a path analysis unit; wherein,
the source code processing unit is used for establishing AST corresponding to the source code of the software to be detected and determining the operable point and the risk point in each node of the established AST according to the operable point and the risk point which are defined in advance;
and the path analysis unit is used for determining an execution path from a controllable point to a risk point in each node of the AST determined by the source code processing unit, and if the risk point on the execution path can be controlled by the controllable point on the execution path, determining the execution path as a potential risk execution path which can cause a security vulnerability.
The device comprises a source code processing unit, a configuration module, an analysis module, a node type positioning module and an AST recording module, wherein the source code processing unit comprises a configuration module, an analysis module, a node type positioning module and an AST recording module;
the configuration module is used for recording predefined manipulatable points and risk points;
the analysis module is used for performing lexical, syntactic and semantic analysis on a source code of the software to be detected and establishing AST;
the node type positioning module is used for determining the operable point and the risk point in each node of the AST established by the analysis module according to the predefined operable point and risk point recorded in the configuration module and recording the AST and the determined operable point and risk point in the AST recording module;
and the AST recording module is used for recording the AST and the determined operable point and risk point and providing the AST and the determined operable point and risk point to the path analysis unit.
According to the device of the present invention, the source code processing unit further comprises a decompilation module for decompilating the executable program code into assembly language source code and sending the assembly language source code to the analysis module.
The device comprises a path analysis unit, a vulnerability analysis unit and a vulnerability analysis unit, wherein the path analysis unit comprises an execution path positioning module and a vulnerability positioning module;
the execution path positioning module is used for searching an execution path from a controllable point to a risk point in the AST and notifying the vulnerability positioning module of the searched execution path;
and the vulnerability positioning module is used for determining whether the risk points on the execution path can be controlled by the controllable points on the execution path or not after receiving the notification of the execution path positioning module, and if so, determining the execution path as a potential risk execution path.
The device further comprises a safety test case generating unit, which is used for generating a test report according to the potential risk execution path determined by the path analyzing unit.
It can be seen from the above solutions that the basis of the security vulnerability detection of the present invention is to analyze the source code to establish the AST, and since the process of establishing the AST is not limited to static lexical and syntactic analyses, but also includes semantic analysis, the present invention has a more comprehensive analysis process for the source code compared to the automatic code security audit technology. Secondly, the invention carries out path analysis on the established AST to find out an execution path which can control the risk point by the controllable point. By using these execution paths, an attacker can control the risk points by inputting carefully designed data to the manipulatable points, thereby causing software execution to be mistaken and even to fall into paralysis. Therefore, the execution path which can be controlled by the controllable point and is found by the invention is the security hole. Therefore, the invention can find out various security holes in the software source code more effectively by searching the path in the AST established by comprehensive source code analysis and can be used as the basis for perfecting the software, thereby effectively avoiding the malicious manipulation of the risk point and enhancing the security of the software source code.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail below with reference to the accompanying drawings and examples.
In practice, an attacker causes software to execute according to a certain path by inputting carefully designed specific data, and finally causes security problems such as buffer overflow, code injection and the like. These Execution paths threatening the security of software are called Potential risk Execution paths (PVEP), which are objects mined by attackers. Therefore, the method and the device achieve the purpose of detecting the security vulnerability by searching the potential risk execution path in the software source code.
Based on the above consideration, the invention provides a detection scheme of software security vulnerability, which establishes an Abstract Syntax Tree (AST) by analyzing software source codes, and determines EP and VP in each node of the established AST according to predefined manageable Points (EP) and risk Points (VP); and then determining an execution path from the EP to the VP by performing path analysis on the AST, and if the VP on the execution path can be controlled by the EP on the execution path, determining the searched execution path as the PVEP which can cause the security vulnerability.
Where an EP refers to an entry point in software source code. Through the EP, commands or data can be input to the software from outside the software to control the software operating state, thereby accomplishing various predetermined tasks. However, if an EP is utilized by an attacker, the attacker can implement malicious operations on the software by entering specific commands or data through the EP.
An EP is typically present in the software source code in functional form. Taking C language source code as an example, an EP may be defined as an input class function provided by a Dynamic Link Library (DLL), a System Call Interface (SCI), an Application Programming Interface (API), a library function, and the like. The input class function includes a user input function (e.g., scanf ()), a system environment input function (e.g., getenv ()), a network input function (e.g., read ()), and the like. In practical applications, the EP can also be a user-defined function.
A VP refers to an execution point in software source code. With VPs, a variety of processes can be performed to perform various reservation tasks. However, if these execution points are indirectly manipulated by an attacker through the EP, adverse consequences such as software operation errors, crashes, and the like may result.
VPs typically include function and assignment statements. Still taking C language source code as an example, execution class functions and assignment statements provided by DLLs, SCIs, APIs, library functions, and the like can be defined as VPs. The execution class function includes an access variable execution function (such as execute ()), an Operating System (OS) command execution function (such as System ()), a database command execution function (such as an embedded database query command EXEC SQL), and the like; the assignment statement includes, for example, a string copy function strcpy (). The above-described access variable execution function may cause a code injection bug, the OS command execution function may cause a code injection bug, the database command execution function may cause a database injection bug, and the assignment statement may cause a buffer overflow bug. In addition, when the access sequence executed on the resource (file, network, etc.) is abnormal, and competition for the same resource is caused, a race condition hole may be caused, and as a result, the program cannot normally run. In practical applications, the VP may also be a user-defined function.
As can be seen from the above, the present invention determines PEVP by creating AST and performing path analysis on the AST. Because the AST establishing process is not limited to static lexical and grammatical analysis, but also comprises semantic analysis, the PEVP found by path analysis on the basis of the established AST can systematically and comprehensively reflect various security holes in software source codes, and the effective detection of the security holes is realized.
The above process of analyzing the source code and establishing the AST may be completed by using a front-end compiler in the compiler. The function of a compiler is to read program source code and translate it into a target language. The existing compiler includes a front-end compiler and a back-end compiler, where the front-end compiler performs lexical, syntactic and semantic analysis on a source code, establishes AST, and simultaneously converts the source code into an Internal Representation (IR) of the compiler, i.e., a language that the compiler can recognize. The back-end compiler analyzes and optimizes the internal expressions and finally generates the object code. The invention applies a front-end compiler to analyze the morphology, the grammar and the semantics of a source code to generate AST.
Fig. 1 is a flowchart of a software security vulnerability detection method in an embodiment of the present invention. This embodiment employs a front-end Compiler CC1 in the GNU C Compiler (GCC, GNU C Compiler) as a module to analyze and build the AST. The GCC under the Linux system is a multi-platform compiler with powerful functions and excellent performance, which is pushed by GNU. Wherein GNU is an acronym for "GNU's Not Unix". As shown in fig. 1, the method comprises the steps of:
step 101: the source code and the predefined EP and VP of the software to be tested are read.
In this step, predefined EP and VP may be recorded in an EP profile and a VP profile, respectively. The EP profile and VP profile may also be the same file.
Wherein the EP profile consists of a plurality of rows, each row describing an EP. The format of the EP profile may be: function name: the parameters may be manipulated. Here, the manipulated parameter is some input parameter or parameters of a function. Since a function may have multiple input parameters, not every parameter may be directly utilized by an attacker, which input parameter in the EP is a manipulated parameter may be specifically defined as desired.
Taking the string input function gets () in C language as an example: the EP profile records gets: 1, representing the function gets () as an EP; gets () has only one input parameter, the first of which is also the only one input parameter that is the manipulated parameter. Wherein, the input parameter of gets () is the input string variable.
Taking C language source code as an example, functions that can be defined as EPs include, but are not limited to: reading a data function from a file, reading a data function from a file indicated by a file handle at a specified offset, reading a string function from a file, entering a string function from a standard input device, reading a broad string function from a file, fgetws (), taking a host name function gethostname () from a system environment, taking a domain name function getdomainname () from a system environment, entering a data function from a standard input device, reading a data function from a string function sscanf (), reading a data function fscanf () from a file, and the like.
Wherein the VP profile consists of a plurality of lines, each line describing a VP. The format of the VP profile may be: function name: a risk parameter. Here, the risk parameter is a certain input parameter or parameters of the function. Since a function may have multiple input parameters, not every parameter may be indirectly manipulated by an attacker and result in a security hole, which input parameter in the VP is a risk parameter may be specifically defined as needed. When the risk parameter is manipulated, its function will result in a security hole.
Take the system () as an example of the command execution function in C language: the VP configuration file has recorded therein system: 1, representing the function system () as a VP, with the first input parameter being the risk parameter. When the first argument of the function system () is contaminated (tainted), or otherwise manipulated, the function system () may be controlled by an attacker through some EP, thereby creating a security hole for code injection. Wherein the first argument of system () is the name of the command being executed.
Taking C language source code as an example, functions that can be defined as VPs include, but are not limited to: executing a shell command function system (), executing a specified program function execute (), executing a program function execute () in a specified path, executing a specified program file function execute (), executing a program function execute () in a specified path, executing a specified program file function execute (), creating a process execution command function post (), and the like.
Step 102: and performing lexical, syntactic and semantic analysis on the read source code, establishing the AST, and determining the EP and the VP in the established AST according to the predefined EP and VP.
In this step, a process of creating the AST by lexical, grammatical, and semantic analysis of the read source code is a prior art. The lexical analysis is to decompose the whole source code into a plurality of tokens (tokens), each token is a single language atomic unit (atomic unit), such as a keyword (keyword), an identifier (identifier), and a symbolic name (symbol name) in a program statement, and is the first step of generating the AST. The syntax analysis is to recognize a syntax structure of a program by analyzing the order of each token. The parsing phase begins building a parse tree, AST. AST is a tree structure described by formal grammar rules and represents the sequence of lemmas using this tree structure. Semantic analysis is to add semantic information in AST and perform semantic checking. AST is built up during the alternation of syntactic and semantic analysis.
Nodes in the AST reflect program statements in the source code. Each node may have children, each of which may have its own children. Each node has at least one attribute, such as the Location (Location), Type (Type), Scope, etc., of the program statement recorded by the node.
After the AST is established or in the process of establishing the AST, whether each node on the AST is a predefined EP or VP is judged one by one, if a certain node on the AST is the same as the predefined EP, the node is recorded as the EP, and if the certain node on the AST is the same as the predefined VP, the node is recorded as the VP.
In this step, when recording an EP and a VP, an EP and a VP in the AST may be recorded in a table, or an EP attribute and a VP attribute may be added to each node of the AST. For example, when a node is an EP, the EP/VP attribute value is set to a preset value, for example, 1; when a node is a VP, the EP/VP attribute value is set to a preset value, for example, 0. Of course, the EP attribute and the VP attribute may also be represented by a two-bit sequence, and the two bits in the bit sequence represent the EP attribute and the VP attribute respectively.
Step 103: searching an execution path from EP to VP in AST, and if the VP on the execution path can be controlled by the EP on the execution path, determining that the execution path is PVEP.
In this step, each EP in the AST is sequentially taken as a current EP, and the following processing is performed for the current EP:
a1, taking the input variables defined as manipulated parameters in the current EP as initial PEV (potential manipulated variables); wherein the initial PEV may be one or more than one variable;
b1, starting from the current EP, carrying out path search in the AST according to a preset search algorithm, and determining whether the current search node is a VP and an intermediate PEV appearing in the current search node one by one; if the current searching node is a VP, determining whether the risk parameter of the VP is an initial PEV or an intermediate PEV on the execution path where the VP is located; if the risk parameter of the VP is the initial PEV or the intermediate PEV on the execution path where the VP is located, the execution path between the VP and the current EP is determined to be a PVEP.
The above-described a1 and b1 steps are repeated for each EP in the AST until the path search for each EP is all completed.
In step b1, the search algorithm used in the path search is usually a depth-first searching (deep-first searching) rule, and the search mode is to start from the root node of the tree, search preferentially to the depth until reaching the last level node of the tree, and if a branch of the tree is encountered during the search, the left branch may be searched first according to a preset rule, or the right branch may be searched first. Taking the first search of the left branch as an example, starting from a root node on the AST, first searching a left branch child node of the root node, then searching a left branch child node of the left branch child node until reaching the last level node of the tree, then returning to a parent node of the last level node, judging whether the parent node has a right branch child node, if so, continuing to search the right branch child node, and then searching a left branch child node of the right branch child node until reaching the last level node of the tree; otherwise, returning to the next upper-level node; and so on until all paths from the root node are traversed, and by this point, the path search for the root node ends.
In step b1, for each current searching node in the path searching process, a judgment is made whether the node is a VP, and if yes, it is determined that an execution path from the current EP to the VP is found.
In step b1, the intermediate PEV appearing in the current search node can be determined by data flow analysis (data flow analysis). In addition, in order to increase the effectiveness of the PEV analysis, a control flow analysis (control flow analysis) may be added to the data flow analysis. Data flow analysis and control flow analysis may be performed as each node is searched. The data flow analysis refers to analysis on definition and use of variables. For example, when a variable is a PEV, it is understood that the variable is contaminated, and the PEV is copied or assigned to another variable, the other variable is also contaminated and becomes another PEV, and the analysis process is a dataflow analysis. Before a program copies or assigns a variable to another variable, a validity check is performed, for example, whether the length of the first variable exceeds a legal length or not is judged, whether a value of the variable includes an illegal character (for example, "/") or not is judged, if the judgment is yes, the copying or assigning operation is not performed, or the copying or assigning operation is performed after corresponding processing is performed, so that the second variable cannot be polluted even if a copy or assignment statement exists. The analysis process for judging whether the program has the legality checking statement belongs to an analysis mode of control flow analysis.
Step 104: and generating a test case according to the detected PVEP, and reporting the test case to a tester.
In practice, if the execution process of a PVEP is relatively simple, the test input and the test condition can be used as the test script content and reported when the test input and the test condition can be directly expressed. Wherein, the test input is a parameter value input by a user, a test environment or a network; the test conditions are condition information that enables the VP on the PVEP to be controlled by the EP.
When the execution process of the PVEP is complex, and thus the test input and the test condition of the test PVEP cannot be easily determined, and manual setting is required, the path description of the PVEP can be recorded in the test report and output to the tester, and the tester performs subsequent tests according to the test report. The path description of the PVEP may be a line number of a program statement recorded by each node on the execution path, and the description mode may be a table mode or a stack mode.
In the following, the case where the execution process of PVEP is simple and a test script can be generated is described by way of example.
First, take an example 1 that uses test input as a test script. In example 1, the risk parameter in VP is the manipulable parameter in EP, and the C language source code of the software program to be detected in example 1 is as follows:
void main()
{
char str[100];
scanf(“%s”,str);
system(str);
}
after the CC1 reads the source code of the software to be detected, AST is generated. Wherein, the node scanf () is EP, and the input parameter str thereof is a manipulated parameter, i.e. the variable str is the initial PEV; system () is VP with its input parameter str being the risk parameter. Since the variable str is the initial PEV, the execution path from scanf ("% s", str) to system (str) is PVEP. Analysis of the PVEP shows that the PVEP is directly input by a user to str, and execution of the PVEP as an operating system command without any check results in code injection vulnerabilities. Thus, let str be "cat/etc/password", then when executed along the PVEP, the password file saved in the/etc path, i.e. the system username file, will be displayed. When a hacker attacks the system, firstly, a list of valid users in the system needs to be acquired, and then the hacker can crack passwords of the users, so that the user name file for storing the list of valid users is sensitive information, and thus str ═ cat/etc/password is taken as the content of the test script in the embodiment. The str ═ cat/etc/password is the test input.
According to the detection result of the PVEP, a tester can perfect the software source code, for example, a judgment statement can be added before the function call of system (str), and whether the value of str is a safe value is judged, for example, the str cannot be a path name cat/etc/password which is prohibited from being accessed; if the str value is safe, system (str) is executed; otherwise, system (str) is not executed. And then, the modified software source code is tested by adopting the test script again, and if the user name file of the system is not displayed any more, the original security loophole is proved to be successfully remedied. Therefore, the automatic software test can be realized by generating the test script, the burden of testers is reduced, and the test efficiency is improved.
Another example 2 uses test input as a test script. In example 2, the risk parameter in VP is not the manipulated parameter in EP, but the risk parameter in VP is contaminated by the manipulated parameter in EP, and the C language source code of example 3 is in the form:
void main()
{
char str[100],*A;
scanf(“%s”,str);
A=str;
system(A);
}
wherein, scanf () is EP, and its input parameter str is a manipulated parameter, i.e. the variable str is the initial PEV; system () is VP, whose input parameter a is the risk parameter. Since A is assigned a str, A is an intermediate PEV, and thus the execution path from scanf ("% s", str) to system (str) is PVEP. As can be seen, when a user assigns "cat/etc/password" to a str, the password file saved in the/etc path will be displayed. Thus, the test script generated by step 105 is str ═ cat/etc/password, which is the test input.
Another example 3 uses test inputs and test conditions as test scripts. In example 3, the C language source code is in the form:
void main()
{
char str[100];
int X;
scanf(“%d%s”,&X,str);
if(X>0)
then system(str);
else return;
}
in example 3, if X is greater than zero, then system (str) is executed, and the test script generated in this example may be: str ═ cat/etc/password, X ═ 1. Wherein str ═ cat/etc/password is a test input, and X ═ 1 is a test condition.
This flow ends by this point.
The embodiment shown in fig. 1 is described in detail below in conjunction with fig. 2. As shown in fig. 2, the method includes:
step 201: and opening the EP configuration file and the VP configuration file, and reading the EP and the VP in the file.
Step 202: reading and analyzing a source code, and establishing AST; all EP and VP occurrences in AST are identified and marked in AST.
The operation of analyzing the source code and establishing the AST in this step is a known technical means, and can be performed by using a front-end compiler CC1 in the GCC, and will not be described in detail here. However, the steps of identifying all EP and VP present in AST and marking in AST cannot be completed by the existing CC1, and modification to CC1 is required to add a function for identifying EP and VP.
In this step, the specific way of marking the identified EP and VP on the AST is as follows: in the process of establishing the AST, whether a node on the AST is an EP is searched according to an EP configuration file, an EP attribute is assigned to be 1 in the node description of the EP, and a VP attribute is assigned to be 0, which indicates that the node is only the EP; and searching whether the node on the AST is the VP according to the VP configuration file, assigning a VP attribute to be 1 and an EP attribute to be 0 in the node description of the VP, and indicating that the node is only the VP. If a node is both an EP and a VP, then the EP attribute and the VP attribute are assigned to 1 in the node description for that node.
Of course, the recording of the EP and VP may be realized by recording the EP and VP on the AST in the table.
In this embodiment, the CC1 must read and convert the source code into a language that can be recognized by a compiler before analyzing the source code. Typically, the CC1 is directly readable and performs the translation when the source code is program code written in a high level programming language such as C/C + +, PASCAL, JAVA, and the like. However, when it is difficult to obtain the source code of the program in practice, the CC1 in this embodiment may decompile the Executable program (Executable Software) into the assembly language, and at this time, the CC1 may read the program code written in the assembly language and complete the conversion. Whether a high-level language or an assembly language is a programming language with data structures, which a compiler can recognize and convert into internal expressions, is implemented using a configuration file input by a user. Each language has its own configuration file, in which the data structure of the language is recorded, and is the basis for the conversion by the compiler. These data structures are described in terms of lexical parser (LEX) and parser (YACC) definitions, so that LEX can parse the lexical meaning of the language, while YACC can parse the grammar of the language.
Step 203: one node in the AST marked as EP is selected as the current EP.
Step 204: recording the position information of the current EP; determining the steerable parameters of the current EP as an initial PEV, recording the initial PEV in a PEV check list of the path search of the current round, and then searching from the current EP, and taking the next searched node as the current searching node.
In this step, the line number of the program statement described by the current EP in the source code is stacked to record the location information of the current EP. Each time a new round of path search is started from an EP, a new PEV check list is built.
Step 205: and recording the position information of the current searching node, determining the middle PEV appearing in the current searching node, and recording the middle PEV in the PEV checking list of the path searching in the current round.
In this step, the line number of the program statement described by the current search node in the source code is stacked, and the intermediate PEV appearing in the current search node is determined by the PEV determination method described in the foregoingstep 103.
When recording the PEV, the scope of the PEV may be further considered, and the scope of the PEV refers to a set of nodes that the PEV will pollute. The scope of the PEV may directly adopt the scope of the node where the PEV is located. The front-end compiler CC1 obtains the node scope as a known technical means, and its basic working method is as follows: firstly, obtaining the attribute value of the Scope of a node, if the attribute value of the Scope is GLOBAL, the node is a GLOBAL node, the PEV obtained from the node is a GLOBAL variable, and the Scope of the PEV is all nodes in the AST. If the value of Scope is LOCAL, the node is a LOCAL node, and the PEV obtained from the node is a LOCAL variable, then the Scope of the PEV is all sibling (brother) nodes in the AST that have the same parent node as the node where the PEV is located; if the program statement described by one peer node is a function, the scope of the PEV does not include child nodes of the peer node; for other cases, if the program statement described by a peer node is an expression, the scope of the PEV includes child nodes of the peer node.
Therefore, instep 205, when recording the PEV, it may be determined whether the scope of the existing PEV in the PEV check list includes the current search node, and the PEV whose scope does not include the current search node may be deleted from the PEV check list. It will be appreciated that since the deleted PEVs do not contaminate the variables in subsequent search nodes in the current round of path search, these deleted PEVs need not be considered in subsequent PEV determination operations.
The scope of a PEV and its role in recording the PEV is described below as an example. Fig. 3 is a schematic diagram of an AST structure. As shown in fig. 3, circles in the figure represent nodes, N in the circles is an english abbreviation of a Node, numbers in the circles are Node numbers, and a straight line between two circles represents a connection relationship between two nodes. Each node and its connection relationship constitute an AST. In the AST shown in fig. 3, N1 is a root node and is EP, N2 is a left-branch child node of N1, N5 is a right-branch child node of N1, N3 and N4 are child nodes of N2, and N6 is a child node of N5. In analyzing N3, PEV2 was determined, since the valid range of N3 is the sibling child node with the same parent as it, i.e., child node N4 of N2, then when analyzing N4, since the scope of PEV2 includes N4, if the variable in N4 is contaminated by PEV2, then the variable is the intermediate PEV present in N4.
However, when N2 and its child nodes are analyzed and N5 needs to be analyzed, since the scope of PEV2 does not include N5, PEV2 may be deleted from the PEV check list, and then it is determined whether an intermediate PEV occurs in N5 according to the PEV in the PEV check list.
Step 206: judging whether the current searching node is VP, if yes, executingstep 207; otherwise,step 209 is performed.
In this step, it is determined whether the VP attribute of the current search node is 1, and if so, it is determined that the current processing node is a VP, and step 207 is performed.
Step 207: and judging whether the risk parameter of the VP is a PEV in a PEV check list, if so, executing astep 208, otherwise, executing astep 209.
In this embodiment, when analyzing the current search node, first record the middle PEV appearing in the current search node in the PEV check list, then determine whether the current search node is a VP, and if so, then determine whether the risk parameter of the VP is a PEV in the PEV check list. In implementation, it may also be determined whether the current search node is a VP, and if so, then determine whether the risk parameter in the VP is contaminated by the PEV in the PEV check list, and if so, add the contaminated risk parameter in the VP to the PEV check list, and executestep 208.
Step 208: and determining a path from the current EP to the VP as PVEP, and generating a test case according to the PVEP and reporting the test case to a tester.
It can be seen that there are two conditions for determining the PVEP in the AST, one is that the current search node is a VP, and the other is that the risk parameter of the node is PEV.
Step 209: judging whether the current searching node is the last node in the path searching of the current round, if so, executing thestep 210; otherwise,step 212 is performed.
Step 210: judging whether all EP on AST are traversed, if yes, ending the process; otherwise,step 211 is executed.
Step 211: the next node marked as EP in the AST is selected as the current EP, and the process returns to step 204.
Step 212: and taking the next node in the path search in the current round as the current search node, and returning to execute thestep 205.
Instep 212, the next node is determined according to a predetermined search algorithm.
This flow ends by this point.
In order to implement the software security vulnerability detection method described in the above embodiment, the embodiment of the present invention further provides a software security vulnerability detection apparatus. Fig. 4 is a schematic structural diagram of a software security vulnerability detection apparatus in an embodiment of the present invention. As shown in fig. 4, the apparatus includes a sourcecode processing unit 1, apath analysis unit 2, and a security testcase generation unit 3. Each constituent module is described in detail below.
First, a sourcecode processing unit 1 is configured to establish an AST corresponding to a source code of software to be detected, and determine an EP and a VP in each node of the established AST according to predefined EPs and VPs.
Fig. 5 is a schematic structural diagram of the sourcecode processing unit 1 in fig. 4. As shown in fig. 5, the sourcecode processing unit 1 may specifically include aconfiguration module 11, ananalysis module 12, a nodetype location module 13, and anAST recording module 14. Wherein,
aconfiguration module 11 for recording predefined EPs and VPs. In particular, an EP profile and a VP profile may be stored, the specific format of the profiles having been described in detail above.
And theanalysis module 12 is configured to perform lexical, syntactic and semantic analysis on the source code of the software to be detected, and establish the AST.
A nodetype location module 13, configured to determine, according to predefined EPs and VPs recorded in theconfiguration module 11, EPs and VPs in each node of the AST established by theanalysis module 12, and record the AST and the determined EPs and VPs in theAST recording module 14.
AnAST recording module 14 for recording the AST and the EP and VP on the AST, and providing to thepath analysis unit 2. Specifically, an AST and a table of EP and VP in the AST may be saved, or an AST with EP and VP marks may be saved.
The sourcecode processing unit 1 may further comprise adecompilation unit 15 for decompilating the executable program code into source code written in assembly language, which is sent to theanalysis module 12. If the source code is program code written in a high level programming language, it can be read directly by theanalysis module 12.
Apath analysis unit 2 in the detection apparatus is configured to determine, in each node of the AST determined by the sourcecode processing unit 1, an execution path between the EP and the VP, and if the VP on the execution path can be controlled by the EP on the execution path, determine the execution path as a PVEP that may cause a security vulnerability. Fig. 6 shows a schematic structural diagram of thepath analyzing unit 2 in fig. 4. As shown in fig. 6, thepath analysis unit 2 may specifically include an executionpath locating module 21 and avulnerability locating module 22. Wherein,
the executionpath locating module 21 is configured to search an execution path between each EP and the VP by traversing each node in the AST using a preset search algorithm, and notify thevulnerability locating module 22 of the searched execution path. As mentioned above, the preset search algorithm may be a depth-first search algorithm.
Avulnerability locating module 22, configured to determine, after receiving the notification from the executionpath locating module 21, whether the VP on the execution path can be controlled by the EP on the execution path, and if so, determine the execution path as the PVEP.
When the steerable parameters and the risk parameters are included in the predefined EPs and VPs, the executionpath location module 21 is further configured to determine, when searching for an execution path, an intermediate PEV on the execution path, which is contaminated by the initial PEV, by using the steerable parameters in each EP as the initial PEV, and send the determined initial PEV and the intermediate PEV to thevulnerability location module 22. The intermediate PEV may be determined by data flow analysis or by a combination of data flow analysis and control flow analysis, as described above.
Correspondingly, thevulnerability location module 22 is specifically configured to determine whether the risk parameter in the VP on the execution path is the initial PEV or the intermediate PEV after receiving the notification, the initial PEV, and the intermediate PEV of the executionpath location module 21, and if so, determine the execution path as the PVEP.
Specifically, thevulnerability localization module 22 may include ajudgment sub-module 61 and alocalization sub-module 62. Wherein,
the determiningsubmodule 61 is configured to receive the notification, the initial PEV, and the intermediate PEV sent by the executionpath positioning module 21, determine whether the risk parameter in the VP on the execution path searched by the executionpath positioning module 21 is the initial PEV or the intermediate PEV, and send the determination result to thepositioning submodule 62.
And apositioning sub-module 62, configured to determine, when the received determination result is yes, the execution path searched by the executionpath positioning module 21 as the PVEP.
And the safety testcase generating unit 3 in the detection device is used for generating a test case according to the PVEP received from the executionpath analyzing unit 2. Specifically, the safety testcase generating unit 3 may determine whether the received EPVP can be represented by the test input and the test condition, and if so, generate the test script according to the test input and the test condition, and report the test script to the tester; otherwise, recording the path of the EPVP in a test report and reporting the test report to a tester.
As can be seen from the above, the basis of detecting the security vulnerability in the present invention is to analyze the source code, and the analysis is not limited to static lexical and syntactic analysis, but also includes semantic analysis and execution path analysis capable of simulating the software execution process, so that various execution paths with potential risks can be effectively detected, and thus the security vulnerability existing in the software source code can be effectively detected.
The above description is only for the preferred embodiment of the present invention, and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.