This repository was archived by the owner on Mar 22, 2024. It is now read-only.

google/AFLPublic archive

NotificationsYou must be signed in to change notification settings
Fork669
Star4.1k

american fuzzy lop - a security-oriented fuzzer

lcamtuf.coredump.cx/afl/

License

Apache-2.0 license

4.1k stars 669 forks Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 53 Commits
.travis		.travis
dictionaries		dictionaries
docs		docs
experimental		experimental
libdislocator		libdislocator
libtokencap		libtokencap
llvm_mode		llvm_mode
qemu_mode		qemu_mode
testcases		testcases
.gitignore		.gitignore
.travis.yml		.travis.yml
Android.bp		Android.bp
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
afl-analyze.c		afl-analyze.c
afl-as.c		afl-as.c
afl-as.h		afl-as.h
afl-cmin		afl-cmin
afl-fuzz.c		afl-fuzz.c
afl-gcc.c		afl-gcc.c
afl-gotcpu.c		afl-gotcpu.c
afl-plot		afl-plot
afl-showmap.c		afl-showmap.c
afl-tmin.c		afl-tmin.c
afl-whatsup		afl-whatsup
alloc-inl.h		alloc-inl.h
android-ashmem.h		android-ashmem.h
config.h		config.h
debug.h		debug.h
hash.h		hash.h
test-instr.c		test-instr.c
test-libfuzzer-target.c		test-libfuzzer-target.c
types.h		types.h

Repository files navigation

american fuzzy lop

Originally developed by Michal Zalewskilcamtuf@google.com.

SeeQuickStartGuide.txt if you don't have time to readthis file.

1) Challenges of guided fuzzing

Fuzzing is one of the most powerful and proven strategies for identifyingsecurity issues in real-world software; it is responsible for the vastmajority of remote code execution and privilege escalation bugs found to datein security-critical software.

Unfortunately, fuzzing is also relatively shallow; blind, random mutationsmake it very unlikely to reach certain code paths in the tested code, leavingsome vulnerabilities firmly outside the reach of this technique.

There have been numerous attempts to solve this problem. One of the earlyapproaches - pioneered by Tavis Ormandy - is corpus distillation. The methodrelies on coverage signals to select a subset of interesting seeds from amassive, high-quality corpus of candidate files, and then fuzz them bytraditional means. The approach works exceptionally well, but requires sucha corpus to be readily available. In addition, block coverage measurementsprovide only a very simplistic understanding of program state, and are lessuseful for guiding the fuzzing effort in the long haul.

Other, more sophisticated research has focused on techniques such as programflow analysis ("concolic execution"), symbolic execution, or static analysis.All these methods are extremely promising in experimental settings, but tendto suffer from reliability and performance problems in practical uses - andcurrently do not offer a viable alternative to "dumb" fuzzing techniques.

2) The afl-fuzz approach

American Fuzzy Lop is a brute-force fuzzer coupled with an exceedingly simplebut rock-solid instrumentation-guided genetic algorithm. It uses a modifiedform of edge coverage to effortlessly pick up subtle, local-scale changes toprogram control flow.

Simplifying a bit, the overall algorithm can be summed up as:

Load user-supplied initial test cases into the queue,
Take next input file from the queue,
Attempt to trim the test case to the smallest size that doesn't alterthe measured behavior of the program,
Repeatedly mutate the file using a balanced and well-researched varietyof traditional fuzzing strategies,
If any of the generated mutations resulted in a new state transitionrecorded by the instrumentation, add mutated output as a new entry in thequeue.
Go to 2.

The discovered test cases are also periodically culled to eliminate ones thathave been obsoleted by newer, higher-coverage finds; and undergo several otherinstrumentation-driven effort minimization steps.

As a side result of the fuzzing process, the tool creates a small,self-contained corpus of interesting test cases. These are extremely usefulfor seeding other, labor- or resource-intensive testing regimes - for example,for stress-testing browsers, office applications, graphics suites, orclosed-source tools.

The fuzzer is thoroughly tested to deliver out-of-the-box performance farsuperior to blind fuzzing or coverage-only tools.

3) Instrumenting programs for use with AFL

When source code is available, instrumentation can be injected by a companiontool that works as a drop-in replacement for gcc or clang in any standard buildprocess for third-party code.

The instrumentation has a fairly modest performance impact; in conjunction withother optimizations implemented by afl-fuzz, most programs can be fuzzed as fastor even faster than possible with traditional tools.

The correct way to recompile the target program may vary depending on thespecifics of the build process, but a nearly-universal approach would be:

$ CC=/path/to/afl/afl-gcc ./configure$ make clean all

For C++ programs, you'd would also want to setCXX=/path/to/afl/afl-g++.

The clang wrappers (afl-clang and afl-clang++) can be used in the same way;clang users may also opt to leverage a higher-performance instrumentation mode,as described in llvm_mode/README.llvm.

When testing libraries, you need to find or write a simple program that readsdata from stdin or from a file and passes it to the tested library. In such acase, it is essential to link this executable against a static version of theinstrumented library, or to make sure that the correct .so file is loaded atruntime (usually by settingLD_LIBRARY_PATH). The simplest option is a staticbuild, usually possible via:

$ CC=/path/to/afl/afl-gcc ./configure --disable-shared

SettingAFL_HARDEN=1 when calling 'make' will cause the CC wrapper toautomatically enable code hardening options that make it easier to detectsimple memory bugs. Libdislocator, a helper library included with AFL (seelibdislocator/README.dislocator) can help uncover heap corruption issues, too.

PS. ASAN users are advised to reviewnotes_for_asan.txt file for importantcaveats.

4) Instrumenting binary-only apps

When source code isNOT available, the fuzzer offers experimental support forfast, on-the-fly instrumentation of black-box binaries. This is accomplishedwith a version of QEMU running in the lesser-known "user space emulation" mode.

QEMU is a project separate from AFL, but you can conveniently build thefeature by doing:

$cd qemu_mode$ ./build_qemu_support.sh

For additional instructions and caveats, see qemu_mode/README.qemu.

The mode is approximately 2-5x slower than compile-time instrumentation, isless conducive to parallelization, and may have some other quirks.

5) Choosing initial test cases

To operate correctly, the fuzzer requires one or more starting file thatcontains a good example of the input data normally expected by the targetedapplication. There are two basic rules:

Keep the files small. Under 1 kB is ideal, although not strictly necessary.For a discussion of why size matters, seeperf_tips.txt.
Use multiple test cases only if they are functionally different fromeach other. There is no point in using fifty different vacation photosto fuzz an image library.

You can find many good examples of starting files in the testcases/ subdirectorythat comes with this tool.

PS. If a large corpus of data is available for screening, you may want to usethe afl-cmin utility to identify a subset of functionally distinct files thatexercise different code paths in the target binary.

6) Fuzzing binaries

The fuzzing process itself is carried out by the afl-fuzz utility. This programrequires a read-only directory with initial test cases, a separate place tostore its findings, plus a path to the binary to test.

For target binaries that accept input directly from stdin, the usual syntax is:

$ ./afl-fuzz -i testcase_dir -o findings_dir /path/to/program [...params...]

For programs that take input from a file, use '@@' to mark the location inthe target's command line where the input file name should be placed. Thefuzzer will substitute this for you:

$ ./afl-fuzz -i testcase_dir -o findings_dir /path/to/program @@

You can also use the -f option to have the mutated data written to a specificfile. This is useful if the program expects a particular file extension or so.

Non-instrumented binaries can be fuzzed in the QEMU mode (add -Q in the commandline) or in a traditional, blind-fuzzer mode (specify -n).

You can use -t and -m to override the default timeout and memory limit for theexecuted process; rare examples of targets that may need these settings touchedinclude compilers and video decoders.

Tips for optimizing fuzzing performance are discussed inperf_tips.txt.

Note that afl-fuzz starts by performing an array of deterministic fuzzingsteps, which can take several days, but tend to produce neat test cases. If youwant quick & dirty results right away - akin to zzuf and other traditionalfuzzers - add the -d option to the command line.

7) Interpreting output

See thestatus_screen.txt file for information onhow to interpret the displayed stats and monitor the health of the process.Be sure to consult this file especially if any UI elements are highlighted inred.

The fuzzing process will continue until you press Ctrl-C. At minimum, you wantto allow the fuzzer to complete one queue cycle, which may take anywhere from acouple of hours to a week or so.

There are three subdirectories created within the output directory and updatedin real time:

queue/ - test cases for every distinctive execution path, plus all thestarting files given by the user. This is the synthesized corpusmentioned in section 2.Before using this corpus for any other purposes, you can shrinkit to a smaller size using the afl-cmin tool. The tool will finda smaller subset of files offering equivalent edge coverage.
crashes/ - unique test cases that cause the tested program to receive afatal signal (e.g., SIGSEGV, SIGILL, SIGABRT). The entries aregrouped by the received signal.
hangs/ - unique test cases that cause the tested program to time out. Thedefault time limit before something is classified as a hang isthe larger of 1 second and the value of the -t parameter.The value can be fine-tuned by setting AFL_HANG_TMOUT, but thisis rarely necessary.

Crashes and hangs are considered "unique" if the associated execution pathsinvolve any state transitions not seen in previously-recorded faults. If asingle bug can be reached in multiple ways, there will be some count inflationearly in the process, but this should quickly taper off.

The file names for crashes and hangs are correlated with parent, non-faultingqueue entries. This should help with debugging.

When you can't reproduce a crash found by afl-fuzz, the most likely cause isthat you are not setting the same memory limit as used by the tool. Try:

$ LIMIT_MB=50$ (ulimit -Sv $[LIMIT_MB<< 10]; /path/to/tested_binary ... )

Change LIMIT_MB to match the -m parameter passed to afl-fuzz. On OpenBSD,also change -Sv to -Sd.

Any existing output directory can be also used to resume aborted jobs; try:

$ ./afl-fuzz -i- -o existing_output_dir [...etc...]

If you have gnuplot installed, you can also generate some pretty graphs for anyactive fuzzing task using afl-plot. For an example of how this looks like,seehttp://lcamtuf.coredump.cx/afl/plot/.

8) Parallelized fuzzing

Every instance of afl-fuzz takes up roughly one core. This means that onmulti-core systems, parallelization is necessary to fully utilize the hardware.For tips on how to fuzz a common target on multiple cores or multiple networkedmachines, please refer toparallel_fuzzing.txt.

The parallel fuzzing mode also offers a simple way for interfacing AFL to otherfuzzers, to symbolic or concolic execution engines, and so forth; again, see thelast section ofparallel_fuzzing.txt for tips.

9) Fuzzer dictionaries

By default, afl-fuzz mutation engine is optimized for compact data formats -say, images, multimedia, compressed data, regular expression syntax, or shellscripts. It is somewhat less suited for languages with particularly verbose andredundant verbiage - notably including HTML, SQL, or JavaScript.

To avoid the hassle of building syntax-aware tools, afl-fuzz provides a way toseed the fuzzing process with an optional dictionary of language keywords,magic headers, or other special tokens associated with the targeted data type-- and use that to reconstruct the underlying grammar on the go:

http://lcamtuf.blogspot.com/2015/01/afl-fuzz-making-up-grammar-with.html

To use this feature, you first need to create a dictionary in one of the twoformats discussed in dictionaries/README.dictionaries; and then point the fuzzerto it via the -x option in the command line.

(Several common dictionaries are already provided in that subdirectory, too.)

There is no way to provide more structured descriptions of the underlyingsyntax, but the fuzzer will likely figure out some of this based on theinstrumentation feedback alone. This actually works in practice, say:

http://lcamtuf.blogspot.com/2015/04/finding-bugs-in-sqlite-easy-way.html

PS. Even when no explicit dictionary is given, afl-fuzz will try to extractexisting syntax tokens in the input corpus by watching the instrumentationvery closely during deterministic byte flips. This works for some types ofparsers and grammars, but isn't nearly as good as the -x mode.

If a dictionary is really hard to come by, another option is to let AFL runfor a while, and then use the token capture library that comes as a companionutility with AFL. For that, see libtokencap/README.tokencap.

10) Crash triage

The coverage-based grouping of crashes usually produces a small data set thatcan be quickly triaged manually or with a very simple GDB or Valgrind script.Every crash is also traceable to its parent non-crashing test case in thequeue, making it easier to diagnose faults.

Having said that, it's important to acknowledge that some fuzzing crashes can bedifficult to quickly evaluate for exploitability without a lot of debugging andcode analysis work. To assist with this task, afl-fuzz supports a very unique"crash exploration" mode enabled with the -C flag.

In this mode, the fuzzer takes one or more crashing test cases as the input,and uses its feedback-driven fuzzing strategies to very quickly enumerate allcode paths that can be reached in the program while keeping it in thecrashing state.

Mutations that do not result in a crash are rejected; so are any changes thatdo not affect the execution path.

The output is a small corpus of files that can be very rapidly examined to seewhat degree of control the attacker has over the faulting address, or whetherit is possible to get past an initial out-of-bounds read - and see what liesbeneath.

Oh, one more thing: for test case minimization, give afl-tmin a try. The toolcan be operated in a very simple way:

$ ./afl-tmin -i test_case -o minimized_result -- /path/to/program [...]

The tool works with crashing and non-crashing test cases alike. In the crashmode, it will happily accept instrumented and non-instrumented binaries. In thenon-crashing mode, the minimizer relies on standard AFL instrumentation to makethe file simpler without altering the execution path.

The minimizer accepts the -m, -t, -f and @@ syntax in a manner compatible withafl-fuzz.

Another recent addition to AFL is the afl-analyze tool. It takes an inputfile, attempts to sequentially flip bytes, and observes the behavior of thetested program. It then color-codes the input based on which sections appear tobe critical, and which are not; while not bulletproof, it can often offer quickinsights into complex file formats. More info about its operation can be foundnear the end oftechnical_details.txt.

11) Going beyond crashes

Fuzzing is a wonderful and underutilized technique for discovering non-crashingdesign and implementation errors, too. Quite a few interesting bugs have beenfound by modifying the target programs to call abort() when, say:

Two bignum libraries produce different outputs when given the samefuzzer-generated input,
An image library produces different outputs when asked to decode the sameinput image several times in a row,
A serialization / deserialization library fails to produce stable outputswhen iteratively serializing and deserializing fuzzer-supplied data,
A compression library produces an output inconsistent with the input filewhen asked to compress and then decompress a particular blob.

Implementing these or similar sanity checks usually takes very little time;if you are the maintainer of a particular package, you can make this codeconditional with#ifdef FUZZING_BUILD_MODE_UNSAFE_FOR_PRODUCTION (a flag alsoshared with libfuzzer) or#ifdef __AFL_COMPILER (this one is just for AFL).

12) Common-sense risks

Please keep in mind that, similarly to many other computationally-intensivetasks, fuzzing may put strain on your hardware and on the OS. In particular:

Your CPU will run hot and will need adequate cooling. In most cases, ifcooling is insufficient or stops working properly, CPU speeds will beautomatically throttled. That said, especially when fuzzing on lesssuitable hardware (laptops, smartphones, etc), it's not entirely impossiblefor something to blow up.
Targeted programs may end up erratically grabbing gigabytes of memory orfilling up disk space with junk files. AFL tries to enforce basic memorylimits, but can't prevent each and every possible mishap. The bottom lineis that you shouldn't be fuzzing on systems where the prospect of data lossis not an acceptable risk.
Fuzzing involves billions of reads and writes to the filesystem. On modernsystems, this will be usually heavily cached, resulting in fairly modest"physical" I/O - but there are many factors that may alter this equation.It is your responsibility to monitor for potential trouble; with very heavyI/O, the lifespan of many HDDs and SSDs may be reduced.
A good way to monitor disk I/O on Linux is the 'iostat' command:

    $ iostat -d 3 -x -k [...optional disk ID...]

13) Known limitations & areas for improvement

Here are some of the most important caveats for AFL:

AFL detects faults by checking for the first spawned process dying due toa signal (SIGSEGV, SIGABRT, etc). Programs that install custom handlers forthese signals may need to have the relevant code commented out. In the samevein, faults in child processed spawned by the fuzzed target may evadedetection unless you manually add some code to catch that.
As with any other brute-force tool, the fuzzer offers limited coverage ifencryption, checksums, cryptographic signatures, or compression are used towholly wrap the actual data format to be tested.
To work around this, you can comment out the relevant checks (seeexperimental/libpng_no_checksum/ for inspiration); if this is not possible,you can also write a postprocessor, as explained inexperimental/post_library/.
There are some unfortunate trade-offs with ASAN and 64-bit binaries. Thisisn't due to any specific fault of afl-fuzz; seenotes_for_asan.txtfor tips.
There is no direct support for fuzzing network services, backgrounddaemons, or interactive apps that require UI interaction to work. You mayneed to make simple code changes to make them behave in a more traditionalway. Preeny may offer a relatively simple option, too - see:https://github.com/zardus/preeny
Some useful tips for modifying network-based services can be also found at:https://www.fastly.com/blog/how-to-fuzz-server-american-fuzzy-lop
AFL doesn't output human-readable coverage data. If you want to monitorcoverage, use afl-cov from Michael Rash:https://github.com/mrash/afl-cov
Occasionally, sentient machines rise against their creators. If thishappens to you, please consulthttp://lcamtuf.coredump.cx/prep/.

Beyond this, see INSTALL for platform-specific tips.

14) Special thanks

Many of the improvements to afl-fuzz wouldn't be possible without feedback,bug reports, or patches from:

  Jann Horn                             Hanno Boeck  Felix Groebert                        Jakub Wilk  Richard W. M. Jones                   Alexander Cherepanov  Tom Ritter                            Hovik Manucharyan  Sebastian Roschke                     Eberhard Mattes  Padraig Brady                         Ben Laurie  @dronesec                             Luca Barbato  Tobias Ospelt                         Thomas Jarosch  Martin Carpenter                      Mudge Zatko  Joe Zbiciak                           Ryan Govostes  Michael Rash                          William Robinet  Jonathan Gray                         Filipe Cabecinhas  Nico Weber                            Jodie Cunningham  Andrew Griffiths                      Parker Thompson  Jonathan Neuschfer                    Tyler Nighswander  Ben Nagy                              Samir Aguiar  Aidan Thornton                        Aleksandar Nikolich  Sam Hakim                             Laszlo Szekeres  David A. Wheeler                      Turo Lamminen  Andreas Stieger                       Richard Godbee  Louis Dassy                           teor2345  Alex Moneger                          Dmitry Vyukov  Keegan McAllister                     Kostya Serebryany  Richo Healey                          Martijn Bogaard  rc0r                                  Jonathan Foote  Christian Holler                      Dominique Pelle  Jacek Wielemborek                     Leo Barnes  Jeremy Barnes                         Jeff Trull  Guillaume Endignoux                   ilovezfs  Daniel Godas-Lopez                    Franjo Ivancic  Austin Seipp                          Daniel Komaromy  Daniel Binderman                      Jonathan Metzman  Vegard Nossum                         Jan Kneschke  Kurt Roeckx                           Marcel Bohme  Van-Thuan Pham                        Abhik Roychoudhury  Joshua J. Drake                       Toby Hutton  Rene Freingruber                      Sergey Davidoff  Sami Liedes                           Craig Young  Andrzej Jackowski                     Daniel Hodson

Thank you!

15) Contact

Questions? Concerns? Bug reports? Please use GitHub.

There is also a mailing list for the project; to join, send a mail toafl-users+subscribe@googlegroups.com. Or, if you prefer to browsearchives first, try:https://groups.google.com/group/afl-users.