Benchmarks#

Setup#

First install theArchery utility to run the benchmark suite.

Running the benchmark suite#

The benchmark suites can be run with thebenchmarkrun sub-command.

# Run benchmarks in the current git workspacearcherybenchmarkrun# Storing the results in a filearcherybenchmarkrun--output=run.json

Sometimes, it is required to pass custom CMake flags, e.g.

exportCC=clang-8CXX=clang++8archerybenchmarkrun--cmake-extras="-DARROW_SIMD_LEVEL=NONE"

Additionally a full CMake build directory may be specified.

archerybenchmarkrun$HOME/arrow/cpp/release-build

Comparison#

One goal with benchmarking is to detect performance regressions. To this end,archery implements a benchmark comparison facility via thebenchmarkdiff sub-command.

In the default invocation, it will compare the current source (known as thecurrent workspace in git) with local main branch:

archery--quietbenchmarkdiff--benchmark-filter=FloatParsing-----------------------------------------------------------------------------------Non-regressions:(1)-----------------------------------------------------------------------------------benchmarkbaselinecontenderchange%countersFloatParsing<FloatType>105.983Mitems/sec105.983Mitems/sec0.0{}------------------------------------------------------------------------------------Regressions:(1)------------------------------------------------------------------------------------benchmarkbaselinecontenderchange%countersFloatParsing<DoubleType>209.941Mitems/sec109.941Mitems/sec-47.632{}

For more information, invoke thearcherybenchmarkdiff--help command formultiple examples of invocation.

Iterating efficiently#

Iterating with benchmark development can be a tedious process due to longbuild time and long run times. Multiple tricks can be used witharcherybenchmarkdiff to reduce this overhead.

First, the benchmark command supports comparing existingbuild directories, This can be paired with the--preserve flag toavoid rebuilding sources from zero.

# First invocation clone and checkouts in a temporary directory. The# directory is preserved with --preservearcherybenchmarkdiff--preserve# Modify C++ sources# Re-run benchmark in the previously created build directory.archerybenchmarkdiff/tmp/arrow-bench*/{WORKSPACE,master}/build

Second, a benchmark run result can be saved in a json file. This also avoidsrebuilding the sources, but also executing the (sometimes) heavy benchmarks.This technique can be used as a poor’s man caching.

# Run the benchmarks on a given commit and save the resultarcherybenchmarkrun--output=run-head-1.jsonHEAD~1# Compare the previous captured result with HEADarcherybenchmarkdiffHEADrun-head-1.json

Third, the benchmark command supports filtering suites (--suite-filter)and benchmarks (--benchmark-filter), both options supports regularexpressions.

# Taking over a previous run, but only filtering for benchmarks matching# `Kernel` and suite matching `compute-aggregate`.archerybenchmarkdiff\--suite-filter=compute-aggregate--benchmark-filter=Kernel\/tmp/arrow-bench*/{WORKSPACE,master}/build

Instead of rerunning benchmarks on comparison, a JSON file (generated byarcherybenchmarkrun) may be specified for the contender and/or thebaseline.

archerybenchmarkrun--output=baseline.json$HOME/arrow/cpp/release-buildgitcheckoutsome-featurearcherybenchmarkrun--output=contender.json$HOME/arrow/cpp/release-buildarcherybenchmarkdiffcontender.jsonbaseline.json

Regression detection#

Writing a benchmark#

  1. The benchmark command will filter (by default) benchmarks with the regularexpression^Regression. This way, not all benchmarks are run by default.Thus, if you want your benchmark to be verified for regressionautomatically, the name must match.

  2. The benchmark command will run with the--benchmark_repetitions=Koptions for statistical significance. Thus, a benchmark should not overridethe repetitions in the (C++) benchmark’s arguments definition.

  3. Due to #2, a benchmark should run sufficiently fast. Often, when the inputdoes not fit in memory (L2/L3), the benchmark will be memory bound insteadof CPU bound. In this case, the input can be downsized.

  4. By default, google’s benchmark library will use the cputime metric, whichis the sum of runtime dedicated on the CPU for all threads of the process.By contrast to realtime which is the wall clock time, e.g. the differencebetween end_time - start_time. In a single thread model, the cputime ispreferable since it is less affected by context switching. In a multi threadscenario, the cputime will give incorrect result since it’ll be inflated bythe number of threads and can be far off realtime. Thus, if the benchmark ismulti threaded, it might be better to useSetRealtime(), see thisexample.

Scripting#

archery is written as a python library with a command line frontend. Thelibrary can be imported to automate some tasks.

Some invocation of the command line interface can be quite verbose due to buildoutput. This can be controlled/avoided with the--quiet option or the--output=<file> can be used, e.g.

archerybenchmarkdiff--benchmark-filter=Kernel--output=compare.json...