![]() | This articlemay be too technical for most readers to understand. Pleasehelp improve it tomake it understandable to non-experts, without removing the technical details.(August 2017) (Learn how and when to remove this message) |
Differential testing,[1] also known asdifferential fuzzing, is asoftware testing technique that detectbugs, by providing the same input to a series of similar applications (or to different implementations of the same application), and observing differences in their execution. Differential testing complements traditional software testing because it is well-suited to findsemantic orlogic bugs that do not exhibit explicit erroneous behaviors like crashes or assertion failures. Differential testing is also called back-to-back testing.
Differential testing finds semantic bugs by using different implementations of the same functionality as cross-referencingoracles, pinpointing differences in their outputs over the same input: any discrepancy between the program behaviors on the same input is marked as a potential bug.
Differential testing has been used to find semantic bugs successfully in diverse domains likeSSL/TLS implementations,[2][3][4][5]C compilers,[6]JVM implementations,[7]Web application firewalls,[8] security policies forAPIs,[9]antivirus software,[4][10] andfile systems.[11] Differential testing has also been used for automated fingerprint generation from differentnetwork protocol implementations.[12]
Unguided differential testing tools generate test inputs independently across iterations without considering the test program’s behavior on past inputs. Such an input generation process does not use any information from past inputs and essentially creates new inputs at random from a prohibitively large input space. This can make the testing process highly inefficient, since large numbers of inputs need to be generated to find a single bug.
An example of a differential testing system that performs unguided input generation is "Frankencerts".[2] This work synthesizes Frankencerts by randomly combining parts of real certificates. It uses syntactically valid certificates to test for semantic violations of SSL/TLS certificate validation across multiple implementations. However, since the creation and selection of Frankencerts are completely unguided, it is significantly inefficient compared to the guided tools.
Guided input generation process aims to minimize the number of inputs needed to find each bug by taking program behavior information for past inputs into account.
An example of a differential testing system that performs domain-specificcoverage-guided input generation is Mucerts.[3] Mucerts relies on the knowledge of the partial grammar of theX.509 certificate format and uses a stochastic sampling algorithm to drive its input generation while tracking the program coverage.
Another line of research builds on the observation that the problem of new input generation from existing inputs can be modeled as a stochastic process. An example of a differential testing tool that uses such a stochastic process modeling for input generation is Chen et al.’s tool.[7] It performs differential testing ofJava virtual machines (JVM) usingMarkov chain Monte Carlo (MCMC) sampling for input generation. It uses custom domain-specific mutations by leveraging detailed knowledge of the Java class file format.
NEZHA[4] is an example of a differential testing tool that has a path selection mechanism geared towards domain-independent differential testing. It uses specific metrics (dubbed as delta-diversity) that summarize and quantify the observed asymmetries between the behaviors of multiple test applications. Such metrics that promote the relative diversity of observed program behavior have shown to be effective in applying differential testing in a domain-independent and black-box manner.
For applications, such ascross-site scripting (XSS) filters and X.509 certificate hostname verification, which can be modeled accurately withfinite-state automata (FSA), counter-example-driven FSA learning techniques can be used to generate inputs that are more likely to find bugs.[8][5]
Symbolic execution[13] is awhite-box technique that executes a program symbolically, computes constraints along different paths, and uses a constraint solver to generate inputs that satisfy the collected constraints along each path. Symbolic execution can also be used to generate input for differential testing.[12][14]
The inherent limitation of symbolic-execution-assisted testing tools—path explosion and scalability—is magnified especially in the context of differential testing where multiple test programs are used. Therefore, it is very hard to scale symbolic execution techniques to perform differential testing of multiple large programs.