Development Guidelines#
This section provides information for developers who wish to contribute to theC++ codebase.
Note
Since most of the project’s developers work on Linux or macOS, not allfeatures or developer tools are uniformly supported on Windows. If you areon Windows, have a look atDeveloping on Windows.
Compiler warning levels#
TheBUILD_WARNING_LEVEL CMake option switches between sets of predeterminedcompiler warning levels that we use for code tidiness. For release builds, thedefault warning level isPRODUCTION, while for debug builds the default isCHECKIN.
When usingCHECKIN for debug builds,-Werror is added when using gccand clang, causing build failures for any warning, and/WX is set with MSVChaving the same effect.
Running unit tests#
The-DARROW_BUILD_TESTS=ON CMake option enables building of unit testexecutables. You can then either run them individually, by launching thedesired executable, or run them all at once by launching thectestexecutable (which is part of the CMake suite).
A possible invocation is something like:
$ctest-j16--output-on-failurewhere the-j16 option runs up to 16 tests in parallel, taking advantageof multiple CPU cores and hardware threads.
Running benchmarks#
The-DARROW_BUILD_BENCHMARKS=ON CMake option enables building of benchmarkexecutables. You can then run benchmarks individually by launching thecorresponding executable from the command line, e.g.:
$./build/release/arrow-builder-benchmarkNote
For meaningful benchmark numbers, it is very strongly recommended to buildinRelease mode, so as to enable compiler optimizations.
Code Style, Linting, and CI#
This project followsGoogle’s C++ Style Guide with these exceptions:
We relax the line length restriction to 90 characters.
We use the
NULLPTRmacro in header files (instead ofnullptr) definedinsrc/arrow/util/macros.hto support building C++/CLI (ARROW-1134).We relax the guide’s rules regarding structs. For public headers we shoulduse struct only for objects that are principally simple data containers whereit is OK to expose all the internal members and any methods are primarilyconveniences. For private headers the rules are relaxed further and structscan be used where convenient for types that do not need access control eventhough they may not be simple data containers.
We prefer pointers for output and input/output parameters (thestyle guide recommends mutable references in some cases).
Our continuous integration builds on GitHub Actions run the unit testsuites on a variety of platforms and configuration, including usingAddress Sanitizer and Undefined Behavior Sanitizer to check for variouspatterns of misbehaviour such as memory leaks. In addition, thecodebase is subjected to a number of code style and code cleanliness checks.
In order to have a passing CI build, your modified Git branch must pass thefollowing checks:
C++ builds with the project’s active version of
clangwithoutcompiler warnings with-DBUILD_WARNING_LEVEL=CHECKIN. Note thatthere are classes of warnings (such as-Wdocumentation, see moreon this below) that are not caught bygcc.Passes various C++ (and others) style checks by running
pre-commitrun--show-diff-on-failure--color=always--all-filescpp.
On pull requests, the “Dev / Lint” pipeline will run these checks, and reportwhat files/lines need to be fixed, if any.
Checking for ABI and API stability#
To build ABI compliance reports, you need to install the two toolsabi-dumper andabi-compliance-checker.
Build Arrow C++ in Debug mode, alternatively you could use-Og which alsobuilds with the necessary symbols but includes a bit of code optimization.Once the build has finished, you can generate ABI reports using:
abi-dumper-lver9debug/libarrow.so-oABI-9.dumpThe above version number is freely selectable. As we want to compare versions,you should nowgitcheckout the version you want to compare it to and re-runthe above command using a different version number. Once both reports aregenerated, you can build a comparison report using
abi-compliance-checker-llibarrow-d1ABI-PY-9.dump-d2ABI-PY-10.dump
The report is then generated incompat_reports/libarrow as a HTML.
API Documentation#
We use Doxygen style comments (///) in header files for commentsthat we wish to show up in API documentation for classes andfunctions.
When usingclang and building with-DBUILD_WARNING_LEVEL=CHECKIN, the-Wdocumentation flag isused which checks for some common documentation inconsistencies, likedocumenting some, but not all function parameters with\param. SeetheLLVM documentation warnings sectionfor more about this.
While we publish the API documentation as part of the main Sphinx-baseddocumentation site, you can also build the C++ API documentation anytime usingDoxygen. Run the following command from thecpp/apidoc directory:
doxygenDoxyfile
This requiresDoxygen to be installed.
Apache Parquet Development#
To build the C++ libraries for Apache Parquet, add the flag-DARROW_PARQUET=ON when invoking CMake.To build Apache Parquet with encryption support, add the flag-DPARQUET_REQUIRE_ENCRYPTION=ON when invoking CMake. The Parquet libraries and unit testscan be built with theparquet make target:
makeparquet
On Linux and macOS if you do not have Apache Thrift installed on your system,or you are building with-DThrift_SOURCE=BUNDLED, you must installbison andflex packages. On Windows we handle these build dependenciesautomatically when building Thrift from source.
Runningctest-Lunittest will run all built C++ unit tests, whilectest-Lparquet will run only the Parquet unit tests. The unit tests depend on anenvironment variablePARQUET_TEST_DATA that depends on a git submodule to therepositoryapache/parquet-testing:
gitsubmoduleupdate--initexportPARQUET_TEST_DATA=$ARROW_ROOT/cpp/submodules/parquet-testing/data
Here$ARROW_ROOT is the absolute path to the Arrow codebase.
Arrow Flight RPC#
In addition to the Arrow dependencies, Flight requires:
gRPC (>= 1.14, roughly)
Protobuf (>= 3.6, earlier versions may work)
c-ares (used by gRPC)
By default, Arrow will try to download and build these dependencieswhen building Flight.
The optionalflight libraries and tests can be built by passing-DARROW_FLIGHT=ON.
cmake..-DARROW_FLIGHT=ON-DARROW_BUILD_TESTS=ONmake
You can also use existing installations of the extra dependencies.When building, set the environment variablesgRPC_ROOT and/orProtobuf_ROOT and/orc-ares_ROOT.
We are developing against recent versions of gRPC, and the versions. Thelibgrpc package available fromhttps://conda-forge.org/ is one reliableway to obtain gRPC in a cross-platform way. You may try using system librariesfor gRPC and Protobuf, but these are likely to be too old. On macOS, you cantryHomebrew:
brewinstallgrpc

