trailofbits/necessistPublic

NotificationsYou must be signed in to change notification settings
Fork19
Star125

A mutation-based tool for finding bugs in tests

License

AGPL-3.0 license

125 stars 19 forks Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 1,865 Commits
.github		.github
backends		backends
core		core
docs		docs
fixtures		fixtures
necessist		necessist
scripts		scripts
.gitattributes		.gitattributes
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CODEOWNERS		CODEOWNERS
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE		LICENSE
README.md		README.md
clippy.toml		clippy.toml
rustfmt.toml		rustfmt.toml

Repository files navigation

Necessist

Run tests with statements and method calls removed to help identify broken tests

Necessist currently supports Anchor, Foundry, Go, Hardhat, Rust, and Vitest.

A paper on Necessist (Test Harness Mutilation) appeared in Mutation 2024. (slides,preprint)

Contents

Installation

System requirements:

Installpkg-config andsqlite3 development files on your system, e.g., on Ubuntu:

sudo apt install pkg-config libsqlite3-dev

Install Necessist fromcrates.io:

cargo install necessist

Install Necessist fromgithub.com:

cargo install --git https://github.com/trailofbits/necessist --branch release

Running

cd into your project's directory and typenecessist (with no arguments).

For example, if youcd into thefixtures/basic directory and typenecessist, you should see the following:

4 candidates in 4 tests in 1 source filesrc/lib.rs: dry runningsrc/lib.rs: mutilatingsrc/lib.rs:4:5-4:12: `n += 1;` passed

Note that there will be a delay while Necessist runs a test with a timeout.

SeeUsage for options that can be passed tonecessist.

Overview

Necessist iteratively removes statements and method calls from tests and then runs them. If a test passes with a statement or method call removed, it could indicate a problem in the test. Or worse, it could indicate a problem in the code being tested.

Example

This example is fromrust-openssl. Theverify_untrusted_callback_override_ok test checks that a failed certificate validation can be overridden by a callback. But if the callback were never called (e.g., because of a failed connection), the test would still pass. Necessist reveals this fact by showing that the test passes without the call toset_verify_callback:

#[test]fnverify_untrusted_callback_override_ok(){let server =Server::builder().build();letmut client = server.client();    client.ctx().set_verify_callback(SslVerifyMode::PEER, |_, x509|{//assert!(x509.current_cert().is_some());// Test passes without this calltrue// to `set_verify_callback`.});//    client.connect();}

Following this discovery, a flag wasadded to the test to record whether the callback is called. The flag must be set for the test to succeed:

#[test]fnverify_untrusted_callback_override_ok(){staticCALLED_BACK:AtomicBool =AtomicBool::new(false);// Addedlet server =Server::builder().build();letmut client = server.client();    client.ctx().set_verify_callback(SslVerifyMode::PEER, |_, x509|{CALLED_BACK.store(true,Ordering::SeqCst);// Addedassert!(x509.current_cert().is_some());true});    client.connect();assert!(CALLED_BACK.load(Ordering::SeqCst));// Added}

Comparison to conventional mutation testing

Click to expand

Conventional mutation testing tries to identifygaps in test coverage, whereas Necessist tries to identifybugs in existing tests.

Conventional mutation testing tools (such asuniversalmutator) randomly inject faults into source code, and see whether the code's tests still pass. If they do, it could mean the code's tests are inadequate.

Notably, conventional mutation testing is about finding deficiencies in the set of tests as a whole, not in individual tests. That is, for any given test, randomly injecting faults into the code is not especially likely to reveal bugs in that test. This is unfortunate since some tests are more important than others, e.g., because ensuring the correctness of some parts of the code is more important than others.

By comparison, Necessist's approach of iteratively removing statements and method calls does target individual tests, and thus can reveal bugs in individual tests.

Of course, there is overlap in the sets of problems the two approaches can uncover, e.g., a failure to find an injected fault could indicate a bug in a test. Nonetheless, for the reasons just given, we see the two approaches as complementary, not competing.

Possible theoretical foundation

Click to expand

The following criterion (*) comes close to describing the statements that Necessist aims to remove:

(*) StatementS'sweakest preconditionP has the same context (e.g., variables in scope) asS's postconditionQ, andP does not implyQ.

The notion that (*) tries to capture is: a statement that affects a subsequent assertion. In this section, we explain and motivate this choice. For concision, we focus on statements, but the remarks in this section apply to method calls as well.

Recall the two kinds ofpredicate transformer semantics: weakest precondition and strongest postcondition. With the former, one reasons about the weakest precondition that could hold prior to a statement, given a postcondition that holds after the statement. With the latter, one reasons about the strongest postcondition that could hold after a statement, given a precondition that holds prior to the statement. Generally speaking, the former is more common (seeAldrich 2013 for an explanation), and it is the one we use here.

Consider a test through this lens. A test is a function with no inputs or outputs. Thus, an alternative procedure for determining whether a test passes is the following. Starting withTrue, iteratively work backwards through the test's statements, computing the weakest precondition of each. If the precondition arrived at for the test's first statement isTrue, then the test passes. If the precondition isFalse, the test fails.

Now, imagine we were to apply this procedure, and consider a statementS that violates (*). We argue that it might not make sense to removeS:

Case 1:S adds or removes variables from the scope (e.g.,S is a declaration), orS changes a variable's type. Then removingS would likely result in a compilation failure. (On top of that, sinceS's precondition and postcondition have different contexts, it's not clear how to compare them.)

Case 2:S's precondition is stronger than its postcondition (e.g.,S is an assertion). ThenS imposes constraints on the environments in which it executes. Put another way,Stests something. Thus, removingS would likely detract from the test's overarching purpose.

Conversely, consider a statementS that satisfies (*). Here is why it might make sense to removeS. Think ofS asshifting the set of valid environments, rather than constraining them. More precisely, ifS's weakest preconditionP does not implyQ, and ifQ is satisfiable, then there is an assignment toP andQ's free variables that satisfies bothP andQ. If such an assignment results from each environment in whichS is actually executed, then the necessity ofS is called into question.

The main utility of (*) is in helping to select the functions, macros, and method calls that Necessist ignores. Necessist ignores certain of these by default. Suppose that, for one of the frameworks, we are considering whether Necessist should ignore some functionfoo. If we imagine a predicate transformer semantics for the framework's testing language, we can ask: if statementS were a call tofoo, wouldS satisfy (*)? If the answer is "no," then Necessist should likely ignorefoo.

Consider Rust'sclone method, for example. A call toclone can be unnecessary. However, if we imagine a predicate transformer semantics for Rust, a call toclone is unlikely to satisfy (*). For this reason, Necessist does not attempt to removeclone calls.

In addition to helping to select the functions, etc. that Necessist ignores, (*) has other nice consequences. For example, the rule that the last statement in a test should be ignored follows from (*). To see this, note that such a statement's postconditionQ is alwaysTrue. Thus, if the statement doesn't change the context, then its weakest precondition necessarily impliesQ.

Having said all this, (*) doesn't quite capture what Necessist actuallydoes. Consider a statement likex -= 1;. Necessist will remove such a statement unconditionally, but (*) says maybe Necessist shouldn't. Assumingoverflow checks are enabled, computing this statement's weakest precondition would look something like the following:

{ Q[(x - 1)/x] ^ x >= 1 }x -= 1;{ Q }

Note thatx -= 1; does not change the context, and thatQ[(x - 1)/x] ^ x >= 1 could implyQ. For example, ifQ does not containx, thenQ[(x - 1)/x] = Q andQ ^ x >= 1 impliesQ.

Given the discrepancy between (*) and Necessist's current behavior, one can ask: which of the two should be adjusted? Put another way, should Necessist remove a statement likex -= 1; unconditionally?

One way to look at this question is: which statements are worth removing, i.e., which statements are "interesting?" As implied above, (*) considers a statement "interesting" if its removal could affect a subsequent assertion. But there are other possible, useful definitions of an "interesting" statement. For example, one could consider strongest postconditions (mentioned above), orframeworks besides Hoare logic entirely.

To be clear, Necessist does not apply (*) formally, e.g., Necessist does not actually compute weakest preconditions. The current role of (*) is to help guide which statements Necessist should ignore, and (*) seems to do well in that role. As such, we leave resolving the aforementioned discrepancy to future work.

Usage

Usage: necessist [OPTIONS] [TEST_FILES_OR_DIRS]... [-- <ARGS>...]Arguments:  [TEST_FILES_OR_DIRS]...  Test files or directories to mutilate (optional)  [ARGS]...                Additional arguments to pass to each test commandOptions:      --allow <WARNING>        Silence <WARNING>; `--allow all` silences all warnings      --default-config         Create a default necessist.toml file in the project's root directory      --deny <WARNING>         Treat <WARNING> as an error; `--deny all` treats all warnings as errors      --dump                   Dump sqlite database contents to the console      --dump-candidate-counts  Dump number of removal candidates in each file and exit      --dump-candidates        Dump removal candidates and exit (for debugging)      --framework <FRAMEWORK>  Assume testing framework is <FRAMEWORK> [possible values: anchor, auto, foundry, go, hardhat, rust, vitest]      --no-sqlite              Do not output to an sqlite database      --quiet                  Do not output to the console      --reset                  Discard sqlite database contents      --resume                 Resume from the sqlite database      --root <ROOT>            Root directory of the project under test      --timeout <TIMEOUT>      Maximum number of seconds to run any test; 60 is the default, 0 means no timeout      --verbose                Show test outcomes besides `passed`  -h, --help                   Print help  -V, --version                Print version

Output

By default, Necessist outputs to the console only when tests pass. Passing--verbose causes Necessist to instead output all of the removal outcomes below.

Outcome	Meaning (With the statement/method call removed...)
passed	The test(s) built and passed.
timed-out	The test(s) built but timed-out.
failed	The test(s) built but failed.
nonbuildable	The test(s) did not build.

By default, Necessist outputs to both the console and to an sqlite database. For the latter, a tool likesqlitebrowser can be used to filter/sort the results.

Details

Generally speaking, Necessist will not attempt to remove a statement if it is one the following:

a statement containing other statements (e.g., afor loop)
a declaration (e.g., a local orlet binding)
abreak,continue, orreturn
the last statement in a test

Similarly, Necessist will not attempt to remove a method call if:

It is the primary effect of an enclosing statement (e.g.,x.foo();).
It appears in the argument list of an ignored function, method, or macro (see below).

Also, for some frameworks, certain statements and methods are ignored. Click on a framework to see its specifics.

Anchor

In addition to the below, the Anchor backend ignores:

throw statements

Ignored functions

assert
Anything beginning withassert. (e.g.,assert.equal)
Anything beginning withconsole. (e.g.,console.log)
expect

Ignored methods

toNumber
toString

Foundry

In addition to the below, the Foundry backend ignores:

a statement immediately following a use ofvm.prank or any form ofvm.expect (e.g.,vm.expectRevert)
anemit statement

Ignored functions

Anything beginning withassert (e.g.,assertEq)
Anything beginning withvm.expect (e.g.,vm.expectCall)
Anything beginning withconsole.log (e.g.,console.log,console.logInt)
Anything beginning withconsole2.log (e.g.,console2.log,console2.logInt)
vm.getLabel
vm.label
vm.startSnapshotGas
vm.stopSnapshotGas

In addition to the below, the Go backend ignores:

defer statements

Ignored functions

Anything beginning withassert. (e.g.,assert.Equal)
Anything beginning withrequire. (e.g.,require.Equal)
panic

Ignored methods*

Close
Error
Errorf
Fail
FailNow
Fatal
Fatalf
Helper
Log
Logf
Parallel
Skip
Skipf
SkipNow

* This list is based primarily ontesting.T's methods. However, some methods with commonplace names are omitted to avoid colliding with other types' methods.

Hardhat

The ignored functions and methods are the same as for Anchor above.

Rust

Ignored macros

assert
assert_eq
assert_matches
assert_ne
debug
eprint
eprintln
error
info
panic
print
println
trace
unimplemented
unreachable
warn

Ignored methods*

as_bytes
as_encoded_bytes
as_mut
as_mut_os_str
as_mut_os_string
as_mut_slice
as_mut_str
as_os_str
as_path
as_ref
as_slice
as_str
borrow
borrow_mut
clone
cloned
copied
deref
deref_mut
expect
expect_err
into_boxed_bytes
into_boxed_os_str
into_boxed_path
into_boxed_slice
into_boxed_str
into_bytes
into_encoded_bytes
into_os_string
into_owned
into_path_buf
into_string
into_vec
iter
iter_mut
success
to_os_string
to_owned
to_path_buf
to_string
to_vec
unwrap
unwrap_err

* This list is essentially the watched trait and inherent methods of Dylint'sunnecessary_conversion_for_trait lint, with the following additions:

clone (e.g.std::clone::Clone::clone)
cloned (e.g.std::iter::Iterator::cloned)
copied (e.g.std::iter::Iterator::copied)
expect (e.g.std::option::Option::expect)
expect_err (e.g.std::result::Result::expect_err)
into_owned (e.g.std::borrow::Cow::into_owned)
success (e.g.assert_cmd::assert::Assert::success)
unwrap (e.g.std::option::Option::unwrap)
unwrap_err (e.g.std::result::Result::unwrap_err)

Vitest

The ignored functions and methods are the same as for Anchor above.

Configuration files

A configuration file allows one to tailor Necessist's behavior with respect to a project. The file must be namednecessist.toml, appear in the project's root directory, and betoml encoded. The file may contain one more of the options listed below.

ignored_functions,ignored_methods,ignored_macros: A list of strings interpreted aspatterns. A function, method, or macro (respectively) whosepath matches a pattern in the list is ignored. Note thatignored_macros is used only by the Rust backend currently.
ignored_path_disambiguation: One of the stringsEither,Function, orMethod. For apath that could refer to a function or method (see below), this option influences whether the function or method is ignored.
- Either (default): Ignore if the path matches either anignored_functions orignored_methods pattern.
- Function: Ignore only if the path matches anignored_functions pattern.
- Method: Ignore only if the path matches anignored_methods pattern.
ignored_tests: A list of strings. A test whose name exactly matches a string in the list is ignored. For Mocha-based frameworks (e.g., Anchor and Hardhat), a test name is considered to be a message passed toit.
walkable_functions: A list of strings interpreted aspatterns. If a test calls a function that matches the pattern, and the function is declared in the same file as the test, then statements and method calls are removed from the function as though it were a test.

Patterns

A pattern is a string composed of letters, numbers,.,_, or*. Each character, other than*, is treated literally and matches itself only. A* matches any string, including the empty string.

The following are examples of patterns:

assert: matches itself only
assert_eq: matches itself only
assertEqual: matches itself only
assert.Equal: matches itself only
assert.*: matchesassert.Equal, but notassert,assert_eq, orassertEqual
assert*: matchesassert,assert_eq,assertEqual, andassert.Equal
*.Equal: matchesassert.Equal, but notEqual

Notes:

Patterns matchpaths, not individual identifiers.
. is treated literally like in aglob pattern, not like in regular expression.

Paths

A path is a sequence of identifiers separated by.. Consider this example (fromChainlink):

operator.connect(roles.oracleNode).signer.sendTransaction({    to: operator.address,    data,}),

In the above,operator.connect andsigner.sendTransaction are paths.

Note, however, that paths likeoperator.connect are ambiguous:

Ifoperator refers to package or module, thenoperator.connect refers to a function.
Ifoperator refers to an object, thenoperator.connect refers to a method.

By default, Necessist ignores such a path if it matches either anignored_functions orignored_methods pattern. Setting theignored_path_disambiguation option above toFunction orMethod causes Necessist ignore the path only if it matches anignored_functions orignored_methods pattern (respectively).

Limitations

Slow. Modifying tests requires them to be rebuilt. Running Necessist on even moderately sized codebases can take several hours.
Triage requires intimate knowledge of the source code. Generally speaking, Necessist does not produce "obvious" bugs. In our experience, deciding whether a statement/method call should be necessary requires intimate knowledge of the code under test. Necessist is best run on codebases for which one has (or intends to have) such knowledge.

Semantic versioning policy

We reserve the right to change the following, and to consider such changes non-breaking:

the syntax that Necessist ignores by default

Changes to the following will be accompanied by a bump of at least Necessist's minor version:

the order in which removal candidates are output
the order in which records are stored in necessist.db

Goals

If a project uses a supported framework, thencding into the project's directory and typingnecessist (with no arguments) should produce meaningful output.

Anti-goals

Become a general-purpose mutation testing tool. Good such tools already exist (e.g.,universalmutator).

References

Groce, A., Ahmed, I., Jensen, C., McKenney, P.E., Holmes, J.: How verified (or tested) is my code? Falsification-driven verification and testing. Autom. Softw. Eng.25, 917–960 (2018). Apreprint is available. See Section 2.3.

License

Necessist is licensed and distributed under the AGPLv3 license.Contact us if you're looking for an exception to the terms.

About

A mutation-based tool for finding bugs in tests

crates.io/crates/necessist

Releases44

Release 2.1.2 Latest

Sep 2, 2025

+ 43 releases

Contributors12

Languages

Rust99.2%
Other0.8%

Movatterモバイル変換

License

trailofbits/necessist

Folders and files

Latest commit

History

Repository files navigation

Necessist

Installation

System requirements:

Install Necessist fromcrates.io:

Install Necessist fromgithub.com:

Running

Overview

Example

Comparison to conventional mutation testing

Possible theoretical foundation

Usage

Output

Details

Ignored functions

Ignored methods

Ignored functions

Ignored functions

Ignored methods*

Ignored macros

Ignored methods*

Configuration files

Patterns

Paths

Limitations

Semantic versioning policy

Goals

Anti-goals

References

License

About

Topics

Resources

License

Security policy

Uh oh!

Stars

Watchers

Forks

Releases44

Uh oh!

Contributors12

Uh oh!

Languages