Packaging and Testing with Crossbow#
The content ofarrow/dev/tasks directory aims for automating the process ofArrow packaging and integration testing.
- Packages:
C++ and Pythonconda-forge packages for Linux, macOS and Windows
PythonWheels for Linux, macOS and Windows
C++ and GLibLinux packages for multiple distributions
Java for Gandiva
- Integration tests:
Various docker tests
Pandas
Dask
Turbodbc
HDFS
Spark
Architecture#
Executors#
Individual jobs are executed on public CI services, currently:
Linux: GitHub Actions, Travis CI, Azure Pipelines
macOS: GitHub Actions, Azure Pipelines
Windows: GitHub Actions, Azure Pipelines
Queue#
Because of the nature of how the CI services work, the scheduling ofjobs happens through an additional git repository, which acts like a jobqueue for the tasks. Anyone can host aqueue repository (usuallynamed<ghuser>/crossbow).
A job is a git commit on a particular git branch, containing the requiredconfiguration files to run the requested builds (like.travis.yml,azure-pipelines.yml, orcrossbow.yml forGitHub Actions ).
Scheduler#
Crossbow handles version generation, task rendering andsubmission. The tasks are defined intasks.yml.
Install#
The following guide depends on GitHub, but theoretically any gitserver can be used.
If you are not using theursacomputing/crossbowrepository, you will need to complete the first two steps, otherwise proceedto step 3:
EnableAzure Pipelines integrations for the newly created queuerepository.
Clone eitherursacomputing/crossbow if you are using that, or the newlycreated repository next to the arrow repository:
By default the scripts looks for a
crossbowclone next to thearrowdirectory, but this can configured through command line arguments.gitclonehttps://github.com/<user>/crossbowcrossbow
Important note: Crossbow only supports GitHub token basedauthentication. Although it overwrites the repository urls provided with sshprotocol, it’s advisable to use the HTTPS repository URLs.
Create a Personal Access Token with
repoandworkflowpermissions (otherpermissions are not needed)Locally export the token as an environment variable:
exportGH_TOKEN=<token>
or pass as an argument to the CLI script
--github-tokenInstall Python (minimum supported version is 3.10):
Miniconda is preferred, see installation instructions:Install the archery toolset containing crossbow itself:
$pipinstall-e"arrow/dev/archery[crossbow]"
Try running it:
$archerycrossbow--help
Usage#
The script does the following:
Detects the current repository, thus supports forks. The followingsnippet will build kszucs’s fork instead of the upstream apache/arrowrepository.
$gitclonehttps://github.com/kszucs/arrow$gitclonehttps://github.com/kszucs/crossbow$cdarrow/dev/tasks$archerycrossbowsubmit--help# show the available options$archerycrossbowsubmitconda-winconda-linuxconda-osx
Gets the HEAD commit of the currently checked out branch andgenerates the version number based onsetuptools_scm. So to builda particular branch check out before running the script:
$gitcheckoutARROW-<ticketnumber>$archerycrossbowsubmit--dry-runconda-linuxconda-osx
Note that the arrow branch must be pushed beforehand, because thescript will clone the selected branch.
Reads and renders the required build configurations with theparameters substituted.
Create a branch per task, prefixed with the job id. For example, tobuild conda recipes on linux, it will create a new branch:
crossbow@build-<id>-conda-linux.Pushes the modified branches to GitHub which triggers the builds. Forauthentication it uses GitHub OAuth tokens described in the installsection.
Query the build status#
Build id (which has a corresponding branch in the queue repository) is returnedby thesubmit command.
$archerycrossbowstatus<buildid/branchname>Download the build artifacts#
$archerycrossbowartifacts<buildid/branchname>Examples#
Submit command accepts a list of task names and/or a list of task-group namesto select which tasks to build.
Run multiple builds:
$archerycrossbowsubmitdebian-stretchconda-linux-gcc-py37-r40Repository: https://github.com/kszucs/arrow@tasksCommit SHA: 810a718836bb3a8cefc053055600bdcc440e6702Version: 0.9.1.dev48+g810a7188.d20180414Pushed branches: - debian-stretch - conda-linux-gcc-py37-r40
Just render without applying or committing the changes:
$archerycrossbowsubmit--dry-runtask_nameRun onlyconda package builds and a Linux one:
$archerycrossbowsubmit--groupcondacentos-7Runwheel builds:
$archerycrossbowsubmit--groupwheelThere are multiple task groups in thetasks.yml like docker, integrationand cpp-python for running docker based tests.
archerycrossbowsubmit supports multiple options and arguments, for moresee its help page:
$archerycrossbowsubmit--help
