Testing 🧪#

In this section we outline steps needed for unit testing in Arrow.

We usepytest forunit tests in Python. For more info about the requiredpackages seePython unit testing section.

Structure

Test layout in PyArrow followspytest structure forTests as part of application code:

pyarrow/__init__.pycsv.pydataset.py...tests/__init__.pytest_csv.pytest_dataset.py...

Tests for Parquet are located in a separate folderpyarrow/tests/parquet/.

Running tests

To run a specific unit test, use this command inthe terminal from thearrow/python folder:

$pytestpyarrow/tests/test_file.py-ktest_your_unit_test

Run all the tests from one file:

$pytestpyarrow/tests/test_file.py

Run all the tests:

$pytestpyarrow

You can also run the tests withpython-mpytest[...]which is almost equivalent to usingpytest[...] directly,except that calling via python will also add the currentdirectory tosys.path and can in some cases help ifpytest[...] results in an ImportError.

Recompiling PyArrow or Arrow C++

If the tests start failing, try to recompile PyArrow orArrow C++. See note in theBuilding other Arrow librariessection under the PyArrow tab.

Fixtures

Inside PyArrow test files there can be helper functionsand fixtures defined. Also other pytest decorators such as@parametrize or@skipif are used.

For example:

  • _alltypes_example intest_pandas supplies adataframe with 100 rows for all data types.

  • _check_pandas_roundtrip intest_pandas asserts if theroundtrip fromPandas throughpa.Table orpa.RecordBatch back toPandas yields the same result.

  • large_buffer fixture supplying a PyArrow buffer of fixedsize to the functiontest_primitive_serialization(large_buffer)intest_serialization.py.

For this reason it is good to look through the file youare planning to add the tests to and see if any ofthe defined functions or fixtures will be helpful.

For more information aboutpytest in general visitFull pytest documentation

We usetestthat forunit testing in R. More specifically, we use the3rd editionof testthat.On rare occasions we might want the behaviour of the 2nd editionof testthat, which is indicated bytestthat::local_edition(2).

Structure

Expect the usual testthat folder structure:

tests├──testthat# test files live here└──testthat.R# runs tests when R CMD check runs (e.g. with devtools::check())

This is the fundamental structure of testing in R withtestthat. Files such astestthat.R are notexpected to change very often. For thearrow Rpackagetestthat.R also defines how the results ofthe various tests are displayed / reported in the console.

Usually, most files in theR/ sub-folder have acorresponding test file intests/testthat.

Running tests

To run all tests in a package locally call

devtools::test()

in the R console. Alternatively, you can use

$maketest

in the shell.

You can run the tests in a single test file you have open with

devtools::test_active_file()

All tests are also run as part of our continuousintegration (CI) pipelines.

TheArrow R Developer guide also has a sectionon running tests.

Good practice

In general any change to source code needs to beaccompanied by unit tests. All tests are expectedto pass before a pull request is merged.

  • Add functionality -> add unit tests

  • Modify functionality -> update unit tests

  • Solve a bug -> add unit test before solving it,which helps prove the bug and its fix

  • Performance improvements should be reflected inbenchmarks (which are also tests)

  • An exception could be refactoring functionality thatis fully covered by unit tests

A good rule of thumb is: If the new functionality isa user-facing or API change, you will almost certainlyneed to change tests — if no tests need to be changed,it might mean the tests aren’t right! If the newfunctionality is a refactor and no APIs are changing,there might not need to be test changes.

Testing helpers

To complement thetestthat functionality, thearrowR package has defined a series of specific utilityfunctions (called helpers), such as:

  • expectations - these start withexpect_ and are usedto compare objects

    • for example, theexpect_…_roundtrip() functionstake an input, convert it to some other format(e.g. arrow, altrep) and then convert it back,confirming that the values are the same.

      x<-c(1,2,3,NA_real_)expect_altrep_roundtrip(x,min,na.rm=TRUE)
  • skip_ - skips a unit test - think of them as acceptablefails. Situations in which we might want to skip unit tests:

    • skip_if_r_version() - this is a specificarrow skip.For example, we use this to skip a unit test when the Rversion is 3.5.0 and below (skip_if_r_version(“3.5.0”)).You will likely see it used when the functionality we aretesting depends on features introduced after version 3.5.0of R (such as the alternative representation of vectors,Altrep, introduced in R 3.5.0, but with significant additionsin subsequent releases). As part of our CI workflow we testagainst different versions of R and this is where thisfeature comes in.

    • skip_if_not_available() - another specific {arrow} skip.Arrow (libarrow) has a number of optional features that can beswitched on or off (which happens at build time). If a unittest depends on such a feature and this feature is notavailable (i.e. was not selected when libarrow was built)the test is skipped, as opposed to having a failed test.

    • skip_if_offline() - will not run tests that require aninternet connection

    • skip_on_os() - for unit tests that are OS specific.

    Important: Once the conditions for askip_() statement is met,no other line of code in the sametest_that() test block willget executed. If theskip is outside of atest_that() codeblock, it will skip the rest of the file.

For more information about unit testing in R in general:

  • thetestthatwebsite

  • theR Packagesbook by Hadley Wickham and Jenny Bryan