Debugging code using Arrow#

GDB extension for Arrow C++#

By default, when asked to print the value of a C++ object,GDB displays the contents of itsmember variables. However, for C++ objects this does not often yielda very useful output, as C++ classes tend to hide their implementation detailsbehind methods and accessors.

For example, here is how aarrow::Status instance may be displayedby GDB:

$3={  <arrow::util::EqualityComparable<arrow::Status>> = {<No data fields>},  <arrow::util::ToStringOstreamable<arrow::Status>> = {<No data fields>},  members of arrow::Status:  state_ = 0x0}

and here is aarrow::Decimal128Scalar:

$4=(arrow::Decimal128Scalar){  <arrow::DecimalScalar<arrow::Decimal128Type, arrow::Decimal128>> = {    <arrow::internal::PrimitiveScalarBase> = {      <arrow::Scalar> = {        <arrow::util::EqualityComparable<arrow::Scalar>> = {<No data fields>},        members of arrow::Scalar:        _vptr.Scalar = 0x7ffff6870e78 <vtable for arrow::Decimal128Scalar+16>,        type = std::shared_ptr<arrow::DataType> (use count 1, weak count 0) = {          get() = 0x555555ce58a0        },        is_valid = true      }, <No data fields>},    members of arrow::DecimalScalar<arrow::Decimal128Type, arrow::Decimal128>:    value = {      <arrow::BasicDecimal128> = {        <arrow::GenericBasicDecimal<arrow::BasicDecimal128, 128, 2>> = {          static kHighWordIndex = <optimized out>,          static kBitWidth = 128,          static kByteWidth = 16,          static LittleEndianArray = <optimized out>,          array_ = {            _M_elems = {[0] = 1234567, [1] = 0}          }        },        members of arrow::BasicDecimal128:        static kMaxPrecision = 38,        static kMaxScale = 38      }, <No data fields>}  }, <No data fields>}

Fortunately, GDB also allows custom extensions to override the default printingfor specific types. We provide aGDB extensionwritten in Python that enables pretty-printing for common Arrow C++ classes,so as to enable a more productive debugging experience. For example,here is how the aforementionedarrow::Status instance will bedisplayed:

$5=arrow::Status::OK()

and here is the samearrow::Decimal128Scalar instance as above:

$6=arrow::Decimal128Scalarofvalue123.4567[precision=10,scale=4]

Manual loading#

To enable the GDB extension for Arrow, you can simplydownload itsomewhere on your computer andsource it from the GDB prompt:

(gdb)source path/to/gdb_arrow.py

You will have tosource it on each new GDB session. You might want tomake this implicit by adding thesource invocation in agdbinit file.

Automatic loading#

GDB provides a facility to automatically load scripts or extensions for eachobject file or library that is involved in a debugging session. You will needto:

  1. Find out what theauto-load locations are for your GDB install.This can be determined usingshow subcommands on the GDB prompt;the answer will depend on the operating system.

    Here is an example on Ubuntu:

    (gdb)show auto-load scripts-directoryList of directories from which to load auto-loaded scripts is $debugdir:$datadir/auto-load.(gdb)show data-directoryGDB's data directory is "/usr/share/gdb".(gdb)show debug-file-directoryThe directory where separate debug symbols are searched for is "/usr/lib/debug".

    This tells you that the directories used for auto-loading are$debugdir and$datadir/auto-load, which expand to/usr/lib/debug/ and/usr/share/gdb/auto-load respectively.

  2. Find out the full path to the Arrow C++ DLL,with all symlinks resolved.For example, you might have installed Arrow 7.0 in/usr/local and thepath to the Arrow C++ DLL could then be/usr/local/lib/libarrow.so.700.0.0.

  3. Determine the actual auto-load script path. It is computed bya) takingthe path of the auto-load directory of your choice,b) appending the fullpath to the Arrow C++ DLL,c) appending-gdb.py at the tail.

    In the example above, if we choose/usr/share/gdb/auto-load as auto-loaddirectory, the full path to the auto-load script will have to be/usr/share/gdb/auto-load/usr/local/lib/libarrow.so.700.0.0-gdb.py.

  4. Either copy or symlink theGDB extension to the file path determinedin step 3 above.

If everything went well, then as soon as GDB encounters the Arrow C++ DLL,it will automatically load the Arrow GDB extension so as to pretty-printArrow C++ classes on the display prompt.

Supported classes#

The Arrow GDB extension provides pretty-printing for the core Arrow C++ classes:

Important utility classes are also covered: