Concolic testing (aportmanteau ofconcrete andsymbolic, also known asdynamic symbolic execution) is a hybridsoftware verification technique that performssymbolic execution, a classical technique that treats program variables as symbolic variables, along aconcrete execution (testing on particular inputs) path. Symbolic execution is used in conjunction with anautomated theorem prover or constraint solver based onconstraint logic programming to generate new concrete inputs (test cases) with the aim of maximizingcode coverage. Its main focus is finding bugs in real-world software, rather than demonstrating program correctness.
A description and discussion of the concept was introduced in "DART: Directed Automated Random Testing" by Patrice Godefroid, Nils Klarlund, and Koushik Sen.[1] The paper "CUTE: A concolic unit testing engine for C",[2] by Koushik Sen, Darko Marinov, andGul Agha, further extended the idea to data structures, and first coined the termconcolic testing. Another tool, called EGT (renamed to EXE and later improved and renamed to KLEE), based on similar ideas was independently developed by Cristian Cadar andDawson Engler in 2005, and published in 2005 and 2006.[3] PathCrawler[4][5] first proposed to perform symbolic execution along a concrete execution path, but unlike concolic testing PathCrawler does not simplify complex symbolic constraints using concrete values. These tools (DART and CUTE, EXE) applied concolic testing to unit testing ofC programs and concolic testing was originally conceived as awhite box improvement upon establishedrandom testing methodologies. The technique was later generalized to testing multithreadedJava programs with jCUTE,[6] and unit testing programs from their executable codes (tool OSMOSE).[7] It was also combined withfuzz testing and extended to detect exploitable security issues in large-scalex86 binaries byMicrosoft Research's SAGE.[8][9]
The concolic approach is also applicable tomodel checking. In a concolic model checker, the model checker traverses states of the model representing the software being checked, while storing both a concrete state and a symbolic state. The symbolic state is used for checking properties on the software, while the concrete state is used to avoid reaching unreachable states. One such tool is ExpliSAT by Sharon Barner, Cindy Eisner, Ziv Glazberg,Daniel Kroening and Ishai Rabinovitz[10]
Implementation of traditional symbolic execution based testing requires the implementation of a full-fledged symbolic interpreter for a programming language. Concolic testing implementors noticed that implementation of full-fledged symbolic execution can be avoided if symbolic execution can be piggy-backed with the normal execution of a program throughinstrumentation. This idea of simplifying implementation of symbolic execution gave birth to concolic testing.
An important reason for the rise of concolic testing (and more generally, symbolic-execution based analysis of programs) in the decade since it was introduced in 2005 is the dramatic improvement in the efficiency and expressive power ofSMT Solvers. The key technical developments that lead to the rapid development of SMT solvers include combination of theories, lazy solving, DPLL(T) and the huge improvements in the speed ofSAT solvers. SMT solvers that are particularly tuned for concolic testing includeZ3, STP, Z3str2, andBoolector.
Consider the following simple example, written in C:
voidf(intx,inty){intz=2*y;if(x==100000){if(x<z){assert(0);/* error */}}}

Simple random testing, trying random values ofx andy, would require an impractically large number of tests to reproduce the failure.
We begin with an arbitrary choice forx andy, for examplex = y = 1. In the concrete execution, line 2 setsz to 2, and the test in line 3 fails since 1 ≠ 100000. Concurrently, the symbolic execution follows the same path but treatsx andy as symbolic variables. It setsz to the expression 2y and notes that, because the test in line 3 failed,x ≠ 100000. This inequality is called apath condition and must be true for all executions following the same execution path as the current one.
Since we'd like the program to follow a different execution path on the next run, we take the last path condition encountered,x ≠ 100000, and negate it, givingx = 100000. An automated theorem prover is then invoked to find values for the input variablesx andy given the complete set of symbolic variable values and path conditions constructed during symbolic execution. In this case, a valid response from the theorem prover might bex = 100000,y = 0.
Running the program on this input allows it to reach the inner branch on line 4, which is not taken since 100000 (x) is not less than 0 (z = 2y). The path conditions arex = 100000 andx ≥z. The latter is negated, givingx <z. The theorem prover then looks forx,y satisfyingx = 100000,x <z, andz = 2y; for example,x = 100000,y = 50001. This input reaches the error.
Essentially, a concolic testing algorithm operates as follows:
There are a few complications to the above procedure:
Symbolic-execution based analysis and testing, in general, has witnessed a significant level of interest from industry.[citation needed] Perhaps the most famous commercial tool that uses dynamic symbolic execution (aka concolic testing) is the SAGE tool from Microsoft. The KLEE and S2E tools (both of which are open-source tools, and use the STP constraint solver) are widely used in many companies including Micro Focus Fortify, NVIDIA, and IBM.[citation needed] Increasingly these technologies are being used by many security companies and hackers alike to find security vulnerabilities.
Concolic testing has a number of limitations:
if(sha256_hash(input) == 0x12345678) { ... } requires the theorem prover to invertSHA256, which is an open problem.Many tools, notably DART and SAGE, have not been made available to the public at large. Note however that for instance SAGE is "used daily" for internal security testing at Microsoft.[13]