Debugging code using Arrow#
GDB extension for Arrow C++#
By default, when asked to print the value of a C++ object,GDB displays the contents of itsmember variables. However, for C++ objects this does not often yielda very useful output, as C++ classes tend to hide their implementation detailsbehind methods and accessors.
For example, here is how aarrow::Status instance may be displayedby GDB:
$3={ <arrow::util::EqualityComparable<arrow::Status>> = {<No data fields>}, <arrow::util::ToStringOstreamable<arrow::Status>> = {<No data fields>}, members of arrow::Status: state_ = 0x0}
and here is aarrow::Decimal128Scalar:
$4=(arrow::Decimal128Scalar){ <arrow::DecimalScalar<arrow::Decimal128Type, arrow::Decimal128>> = { <arrow::internal::PrimitiveScalarBase> = { <arrow::Scalar> = { <arrow::util::EqualityComparable<arrow::Scalar>> = {<No data fields>}, members of arrow::Scalar: _vptr.Scalar = 0x7ffff6870e78 <vtable for arrow::Decimal128Scalar+16>, type = std::shared_ptr<arrow::DataType> (use count 1, weak count 0) = { get() = 0x555555ce58a0 }, is_valid = true }, <No data fields>}, members of arrow::DecimalScalar<arrow::Decimal128Type, arrow::Decimal128>: value = { <arrow::BasicDecimal128> = { <arrow::GenericBasicDecimal<arrow::BasicDecimal128, 128, 2>> = { static kHighWordIndex = <optimized out>, static kBitWidth = 128, static kByteWidth = 16, static LittleEndianArray = <optimized out>, array_ = { _M_elems = {[0] = 1234567, [1] = 0} } }, members of arrow::BasicDecimal128: static kMaxPrecision = 38, static kMaxScale = 38 }, <No data fields>} }, <No data fields>}
Fortunately, GDB also allows custom extensions to override the default printingfor specific types. We provide aGDB extensionwritten in Python that enables pretty-printing for common Arrow C++ classes,so as to enable a more productive debugging experience. For example,here is how the aforementionedarrow::Status instance will bedisplayed:
$5=arrow::Status::OK()
and here is the samearrow::Decimal128Scalar instance as above:
$6=arrow::Decimal128Scalarofvalue123.4567[precision=10,scale=4]
Manual loading#
To enable the GDB extension for Arrow, you can simplydownload itsomewhere on your computer andsource it from the GDB prompt:
(gdb)source path/to/gdb_arrow.py
You will have tosource it on each new GDB session. You might want tomake this implicit by adding thesource invocation in agdbinit file.
Automatic loading#
GDB provides a facility to automatically load scripts or extensions for eachobject file or library that is involved in a debugging session. You will needto:
Find out what theauto-load locations are for your GDB install.This can be determined using
showsubcommands on the GDB prompt;the answer will depend on the operating system.Here is an example on Ubuntu:
(gdb)show auto-load scripts-directoryList of directories from which to load auto-loaded scripts is $debugdir:$datadir/auto-load.(gdb)show data-directoryGDB's data directory is "/usr/share/gdb".(gdb)show debug-file-directoryThe directory where separate debug symbols are searched for is "/usr/lib/debug".
This tells you that the directories used for auto-loading are
$debugdirand$datadir/auto-load, which expand to/usr/lib/debug/and/usr/share/gdb/auto-loadrespectively.Find out the full path to the Arrow C++ DLL,with all symlinks resolved.For example, you might have installed Arrow 7.0 in
/usr/localand thepath to the Arrow C++ DLL could then be/usr/local/lib/libarrow.so.700.0.0.Determine the actual auto-load script path. It is computed bya) takingthe path of the auto-load directory of your choice,b) appending the fullpath to the Arrow C++ DLL,c) appending
-gdb.pyat the tail.In the example above, if we choose
/usr/share/gdb/auto-loadas auto-loaddirectory, the full path to the auto-load script will have to be/usr/share/gdb/auto-load/usr/local/lib/libarrow.so.700.0.0-gdb.py.Either copy or symlink theGDB extension to the file path determinedin step 3 above.
If everything went well, then as soon as GDB encounters the Arrow C++ DLL,it will automatically load the Arrow GDB extension so as to pretty-printArrow C++ classes on the display prompt.
Supported classes#
The Arrow GDB extension provides pretty-printing for the core Arrow C++ classes:
arrow::DataTypeand subclassesarrow::ArrayData,arrow::Arrayand subclassesarrow::Scalarand subclasses
Important utility classes are also covered:
arrow::Bufferand subclasses

