Building Arrow Java#

System Setup#

Arrow Java uses theMaven build system.

Building requires:

  • JDK 11+

  • Maven 3+

Note

CI will test all supported JDK LTS versions, plus the latest non-LTS version.

Building#

All the instructions below assume that you have cloned the Arrow gitrepository:

$gitclonehttps://github.com/apache/arrow.git$cdarrow$gitsubmoduleupdate--init--recursive

These are the options available to compile Arrow Java modules with:

  • Maven build tool.

  • Docker Compose.

  • Archery.

Building Java Modules#

To build the default modules, go to the project root and execute:

Maven#

$ cd arrow/java$ export JAVA_HOME=<absolute path to your java home>$ java --version$ mvn clean install

Docker compose#

$ cd arrow/java$ export JAVA_HOME=<absolute path to your java home>$ java --version$ docker compose run java

Archery#

$ cd arrow/java$ export JAVA_HOME=<absolute path to your java home>$ java --version$ archery docker run java

Building JNI Libraries (*.dylib / *.so / *.dll)#

First, we need to build theC++ shared libraries that the JNI bindings will use.We can build these manually or we can useArchery to build them using a Docker container(This will require installing Docker, Docker Compose, and Archery).

Note

If you are building on Apple Silicon, be sure to use a JDK version that was compiledfor that architecture. See, for example, theAzul JDK.

If you are building on Windows OS, seeDeveloping on Windows.

Maven#

  • To build only the JNI C Data Interface library (macOS / Linux):

    $ cd arrow/java$ export JAVA_HOME=<absolute path to your java home>$ java --version$ mvn generate-resources -Pgenerate-libs-cdata-all-os -N$ ls -latr ../java-dist/lib|__ arrow_cdata_jni/
  • To build only the JNI C Data Interface library (Windows):

    $ cd arrow/java$ mvn generate-resources -Pgenerate-libs-cdata-all-os -N$ dir "../java-dist/bin"|__ arrow_cdata_jni/
  • To build all JNI libraries (macOS / Linux) except the JNI C Data Interface library:

    $ cd arrow/java$ export JAVA_HOME=<absolute path to your java home>$ java --version$ mvn generate-resources -Pgenerate-libs-jni-macos-linux -N$ ls -latr java-dist/lib|__ arrow_dataset_jni/|__ arrow_orc_jni/|__ gandiva_jni/
  • To build all JNI libraries (Windows) except the JNI C Data Interface library:

    $ cd arrow/java$ mvn generate-resources -Pgenerate-libs-jni-windows -N$ dir "../java-dist/bin"|__ arrow_dataset_jni/

CMake#

  • To build only the JNI C Data Interface library (macOS / Linux):

    $ cd arrow$ mkdir -p java-dist java-cdata$ cmake \    -S java \    -B java-cdata \    -DARROW_JAVA_JNI_ENABLE_C=ON \    -DARROW_JAVA_JNI_ENABLE_DEFAULT=OFF \    -DBUILD_TESTING=OFF \    -DCMAKE_BUILD_TYPE=Release \    -DCMAKE_INSTALL_PREFIX=java-dist$ cmake --build java-cdata --target install --config Release$ ls -latr java-dist/lib|__ arrow_cdata_jni/
  • To build only the JNI C Data Interface library (Windows):

    $ cd arrow$ mkdir java-dist, java-cdata$ cmake ^    -S java ^    -B java-cdata ^    -DARROW_JAVA_JNI_ENABLE_C=ON ^    -DARROW_JAVA_JNI_ENABLE_DEFAULT=OFF ^    -DBUILD_TESTING=OFF ^    -DCMAKE_BUILD_TYPE=Release ^    -DCMAKE_INSTALL_PREFIX=java-dist$ cmake --build java-cdata --target install --config Release$ dir "java-dist/bin"|__ arrow_cdata_jni/
  • To build all JNI libraries (macOS / Linux) except the JNI C Data Interface library:

    $ cd arrow$ brew bundle --file=cpp/Brewfile# Homebrew Bundle complete! 25 Brewfile dependencies now installed.$ brew uninstall aws-sdk-cpp#  (We can't use aws-sdk-cpp installed by Homebrew because it has#  an issue: https://github.com/aws/aws-sdk-cpp/issues/1809 )$ export JAVA_HOME=<absolute path to your java home>$ mkdir -p java-dist cpp-jni$ cmake \    -S cpp \    -B cpp-jni \    -DARROW_BUILD_SHARED=OFF \    -DARROW_CSV=ON \    -DARROW_DATASET=ON \    -DARROW_DEPENDENCY_SOURCE=BUNDLED \    -DARROW_DEPENDENCY_USE_SHARED=OFF \    -DARROW_FILESYSTEM=ON \    -DARROW_GANDIVA=ON \    -DARROW_GANDIVA_STATIC_LIBSTDCPP=ON \    -DARROW_JSON=ON \    -DARROW_ORC=ON \    -DARROW_PARQUET=ON \    -DARROW_S3=ON \    -DARROW_SUBSTRAIT=ON \    -DARROW_USE_CCACHE=ON \    -DCMAKE_BUILD_TYPE=Release \    -DCMAKE_INSTALL_PREFIX=java-dist \    -DCMAKE_UNITY_BUILD=ON$ cmake --build cpp-jni --target install --config Release$ cmake \    -S java \    -B java-jni \    -DARROW_JAVA_JNI_ENABLE_C=OFF \    -DARROW_JAVA_JNI_ENABLE_DEFAULT=ON \    -DBUILD_TESTING=OFF \    -DCMAKE_BUILD_TYPE=Release \    -DCMAKE_INSTALL_PREFIX=java-dist \    -DCMAKE_PREFIX_PATH=$PWD/java-dist \    -DProtobuf_ROOT=$PWD/../cpp-jni/protobuf_ep-install \    -DProtobuf_USE_STATIC_LIBS=ON$ cmake --build java-jni --target install --config Release$ ls -latr java-dist/lib/|__ arrow_dataset_jni/|__ arrow_orc_jni/|__ gandiva_jni/
  • To build all JNI libraries (Windows) except the JNI C Data Interface library:

    $ cd arrow$ mkdir java-dist, cpp-jni$ cmake ^    -S cpp ^    -B cpp-jni ^    -DARROW_BUILD_SHARED=OFF ^    -DARROW_CSV=ON ^    -DARROW_DATASET=ON ^    -DARROW_DEPENDENCY_USE_SHARED=OFF ^    -DARROW_FILESYSTEM=ON ^    -DARROW_GANDIVA=OFF ^    -DARROW_JSON=ON ^    -DARROW_ORC=ON ^    -DARROW_PARQUET=ON ^    -DARROW_S3=ON ^    -DARROW_SUBSTRAIT=ON ^    -DARROW_USE_CCACHE=ON ^    -DARROW_WITH_BROTLI=ON ^    -DARROW_WITH_LZ4=ON ^    -DARROW_WITH_SNAPPY=ON ^    -DARROW_WITH_ZLIB=ON ^    -DARROW_WITH_ZSTD=ON ^    -DCMAKE_BUILD_TYPE=Release ^    -DCMAKE_INSTALL_PREFIX=java-dist ^    -DCMAKE_UNITY_BUILD=ON ^    -GNinja$ cd cpp-jni$ ninja install$ cd ../$ cmake ^    -S java ^    -B java-jni ^    -DARROW_JAVA_JNI_ENABLE_C=OFF ^    -DARROW_JAVA_JNI_ENABLE_DATASET=ON ^    -DARROW_JAVA_JNI_ENABLE_DEFAULT=ON ^    -DARROW_JAVA_JNI_ENABLE_GANDIVA=OFF ^    -DARROW_JAVA_JNI_ENABLE_ORC=ON ^    -DBUILD_TESTING=OFF ^    -DCMAKE_BUILD_TYPE=Release ^    -DCMAKE_INSTALL_PREFIX=java-dist ^    -DCMAKE_PREFIX_PATH=$PWD/java-dist$ cmake --build java-jni --target install --config Release$ dir "java-dist/bin"|__ arrow_orc_jni/|__ arrow_dataset_jni/

Archery#

$ cd arrow$ archery docker run java-jni-manylinux-2014$ ls -latr java-dist|__ arrow_cdata_jni/|__ arrow_dataset_jni/|__ arrow_orc_jni/|__ gandiva_jni/

Building Java JNI Modules#

  • To compile the JNI bindings, use thearrow-c-data Maven profile:

    $ cd arrow/java$ mvn -Darrow.c.jni.dist.dir=<absolute path to your arrow folder>/java-dist/lib -Parrow-c-data clean install
  • To compile the JNI bindings for ORC / Gandiva / Dataset, use thearrow-jni Maven profile:

    $ cd arrow/java$ mvn \    -Darrow.cpp.build.dir=<absolute path to your arrow folder>/java-dist/lib/ \    -Darrow.c.jni.dist.dir=<absolute path to your arrow folder>/java-dist/lib/ \    -Parrow-jni clean install

Testing#

By default, Maven uses the same Java version to both build the code and run the tests.

It is also possible to use a different JDK version for the tests. This requires Maventoolchains to be configured beforehand, and then a specific test property needs to be set.

Configuring Maven toolchains#

To be able to use a JDK version for testing, it needs to be registered first in Maventoolchains.xmlconfiguration file usually located under${HOME}/.m2 with the following snippet added to it:

<?xml version="1.0" encoding="UTF8"?><toolchains>[...]<toolchain><type>jdk</type><provides><version>21</version><!-- Replace with the corresponding JDK version: 11, 17, ... --><vendor>temurin</vendor><!-- Replace with the vendor/distribution: temurin, oracle, zulu ... --></provides><configuration><jdkHome>path/to/jdk/home</jdkHome><!-- Replace with the path to the JDK --></configuration></toolchain>[...]</toolchains>

Testing with a specific JDK#

To run Arrow tests with a specific JDK version, use thearrow.test.jdk-version property.

For example, to run Arrow tests with JDK 17, use the following snippet:

$ cd arrow/java$ mvn -Darrow.test.jdk-version=17 clean verify

IDE Configuration#

IntelliJ#

To start working on Arrow in IntelliJ: build the project once from the commandline usingmvncleaninstall. Then open thejava/ subdirectory of theArrow repository, and update the following settings:

  • In the Files tool window, find the pathvector/target/generated-sources,right click the directory, and select Mark Directory as > Generated SourcesRoot. There is no need to mark other generated sources directories, as onlythevector module generates sources.

  • For JDK 11, due to anIntelliJ bug, you must go intoSettings > Build, Execution, Deployment > Compiler > Java Compiler and disable“Use ‘–release’ option for cross-compilation (Java 9 and later)”. Otherwiseyou will get an error like “package sun.misc does not exist”.

  • You may want to disable error-prone entirely if it gives spuriouswarnings (disable both error-prone profiles in the Maven tool windowand “Reload All Maven Projects”).

  • If using IntelliJ’s Maven integration to build, you may need to change<fork> tofalse in the pom.xml files due to anIntelliJ bug.

  • To enable debugging JNI-based modules likedataset,activate specific profiles in the Maven tab under “Profiles”.Ensure the profilesarrow-c-data,arrow-jni,generate-libs-cdata-all-os,generate-libs-jni-macos-linux, andjdk11+ are enabled, so that theIDE can build them and enable debugging.

You may not need to update all of these settings if you build/test with theIntelliJ Maven integration instead of with IntelliJ directly.

Common Errors#

  • When working with the JNI code: if the C++ build cannot find dependencies, with errors like these:

    Could NOT find Boost (missing: Boost_INCLUDE_DIR system filesystem)Could NOT find Lz4 (missing: LZ4_LIB)Could NOT find zstd (missing: ZSTD_LIB)

    Specify that the dependencies should be downloaded at build time (more details atDependency Resolution):

    -Dre2_SOURCE=BUNDLED \-DBoost_SOURCE=BUNDLED \-Dutf8proc_SOURCE=BUNDLED \-DSnappy_SOURCE=BUNDLED \-DORC_SOURCE=BUNDLED \-DZLIB_SOURCE=BUNDLED

Installing Nightly Packages#

Warning

These packages are not official releases. Use them at your own risk.

Arrow nightly builds are posted on the mailing list atbuilds@arrow.apache.org.The artifacts are uploaded to GitHub. For example, for 2022/07/30, they can be found atGitHub Nightly.

Installing from Apache Nightlies#

  1. Look up the nightly version number for the Arrow libraries used.

    For example, forarrow-memory, visithttps://nightlies.apache.org/arrow/java/org/apache/arrow/arrow-memory/ and see what versions are available (e.g. 9.0.0.dev501).

  2. Add Apache Nightlies Repository to the Maven/Gradle project.

    <properties><arrow.version>9.0.0.dev501</arrow.version></properties>...<repositories><repository><id>arrow-apache-nightlies</id><url>https://nightlies.apache.org/arrow/java</url></repository></repositories>...<dependencies><dependency><groupId>org.apache.arrow</groupId><artifactId>arrow-vector</artifactId><version>${arrow.version}</version></dependency></dependencies>...

Installing Manually#

  1. Decide nightly packages repository to use, for example:ursacomputing/crossbow

  2. Add packages to your pom.xml, for example: flight-core (it depends on: arrow-format, arrow-vector, arrow-memory-core and arrow-memory-netty).

    <properties><maven.compiler.source>8</maven.compiler.source><maven.compiler.target>8</maven.compiler.target><arrow.version>9.0.0.dev501</arrow.version></properties><dependencies><dependency><groupId>org.apache.arrow</groupId><artifactId>flight-core</artifactId><version>${arrow.version}</version></dependency></dependencies>
  3. Download the necessary pom and jar files to a temporary directory:

    $mkdirnightly-packaging-2022-07-30-0-github-java-jars$cdnightly-packaging-2022-07-30-0-github-java-jars$wgethttps://github.com/ursacomputing/crossbow/releases/download/nightly-packaging-2022-07-30-0-github-java-jars/arrow-java-root-9.0.0.dev501.pom$wgethttps://github.com/ursacomputing/crossbow/releases/download/nightly-packaging-2022-07-30-0-github-java-jars/arrow-format-9.0.0.dev501.pom$wgethttps://github.com/ursacomputing/crossbow/releases/download/nightly-packaging-2022-07-30-0-github-java-jars/arrow-format-9.0.0.dev501.jar$wgethttps://github.com/ursacomputing/crossbow/releases/download/nightly-packaging-2022-07-30-0-github-java-jars/arrow-vector-9.0.0.dev501.pom$wgethttps://github.com/ursacomputing/crossbow/releases/download/nightly-packaging-2022-07-30-0-github-java-jars/arrow-vector-9.0.0.dev501.jar$wgethttps://github.com/ursacomputing/crossbow/releases/download/nightly-packaging-2022-07-30-0-github-java-jars/arrow-memory-9.0.0.dev501.pom$wgethttps://github.com/ursacomputing/crossbow/releases/download/nightly-packaging-2022-07-30-0-github-java-jars/arrow-memory-core-9.0.0.dev501.pom$wgethttps://github.com/ursacomputing/crossbow/releases/download/nightly-packaging-2022-07-30-0-github-java-jars/arrow-memory-netty-9.0.0.dev501.pom$wgethttps://github.com/ursacomputing/crossbow/releases/download/nightly-packaging-2022-07-30-0-github-java-jars/arrow-memory-core-9.0.0.dev501.jar$wgethttps://github.com/ursacomputing/crossbow/releases/download/nightly-packaging-2022-07-30-0-github-java-jars/arrow-memory-netty-9.0.0.dev501.jar$wgethttps://github.com/ursacomputing/crossbow/releases/download/nightly-packaging-2022-07-30-0-github-java-jars/arrow-flight-9.0.0.dev501.pom$wgethttps://github.com/ursacomputing/crossbow/releases/download/nightly-packaging-2022-07-30-0-github-java-jars/flight-core-9.0.0.dev501.pom$wgethttps://github.com/ursacomputing/crossbow/releases/download/nightly-packaging-2022-07-30-0-github-java-jars/flight-core-9.0.0.dev501.jar$tree.├──arrow-flight-9.0.0.dev501.pom├──arrow-format-9.0.0.dev501.jar├──arrow-format-9.0.0.dev501.pom├──arrow-java-root-9.0.0.dev501.pom├──arrow-memory-9.0.0.dev501.pom├──arrow-memory-core-9.0.0.dev501.jar├──arrow-memory-core-9.0.0.dev501.pom├──arrow-memory-netty-9.0.0.dev501.jar├──arrow-memory-netty-9.0.0.dev501.pom├──arrow-vector-9.0.0.dev501.jar├──arrow-vector-9.0.0.dev501.pom├──flight-core-9.0.0.dev501.jar└──flight-core-9.0.0.dev501.pom
  4. Install the artifacts to the local Maven repository withmvninstall:install-file:

    $mvninstall:install-file-Dfile="$(pwd)/arrow-java-root-9.0.0.dev501.pom"-DgroupId=org.apache.arrow-DartifactId=arrow-java-root-Dversion=9.0.0.dev501-Dpackaging=pom$mvninstall:install-file-Dfile="$(pwd)/arrow-format-9.0.0.dev501.pom"-DgroupId=org.apache.arrow-DartifactId=arrow-format-Dversion=9.0.0.dev501-Dpackaging=pom$mvninstall:install-file-Dfile="$(pwd)/arrow-format-9.0.0.dev501.jar"-DgroupId=org.apache.arrow-DartifactId=arrow-format-Dversion=9.0.0.dev501-Dpackaging=jar$mvninstall:install-file-Dfile="$(pwd)/arrow-vector-9.0.0.dev501.pom"-DgroupId=org.apache.arrow-DartifactId=arrow-vector-Dversion=9.0.0.dev501-Dpackaging=pom$mvninstall:install-file-Dfile="$(pwd)/arrow-vector-9.0.0.dev501.jar"-DgroupId=org.apache.arrow-DartifactId=arrow-vector-Dversion=9.0.0.dev501-Dpackaging=jar$mvninstall:install-file-Dfile="$(pwd)/arrow-memory-9.0.0.dev501.pom"-DgroupId=org.apache.arrow-DartifactId=arrow-memory-Dversion=9.0.0.dev501-Dpackaging=pom$mvninstall:install-file-Dfile="$(pwd)/arrow-memory-core-9.0.0.dev501.pom"-DgroupId=org.apache.arrow-DartifactId=arrow-memory-core-Dversion=9.0.0.dev501-Dpackaging=pom$mvninstall:install-file-Dfile="$(pwd)/arrow-memory-netty-9.0.0.dev501.pom"-DgroupId=org.apache.arrow-DartifactId=arrow-memory-netty-Dversion=9.0.0.dev501-Dpackaging=pom$mvninstall:install-file-Dfile="$(pwd)/arrow-memory-core-9.0.0.dev501.jar"-DgroupId=org.apache.arrow-DartifactId=arrow-memory-core-Dversion=9.0.0.dev501-Dpackaging=jar$mvninstall:install-file-Dfile="$(pwd)/arrow-memory-netty-9.0.0.dev501.jar"-DgroupId=org.apache.arrow-DartifactId=arrow-memory-netty-Dversion=9.0.0.dev501-Dpackaging=jar$mvninstall:install-file-Dfile="$(pwd)/arrow-flight-9.0.0.dev501.pom"-DgroupId=org.apache.arrow-DartifactId=arrow-flight-Dversion=9.0.0.dev501-Dpackaging=pom$mvninstall:install-file-Dfile="$(pwd)/flight-core-9.0.0.dev501.pom"-DgroupId=org.apache.arrow-DartifactId=flight-core-Dversion=9.0.0.dev501-Dpackaging=pom$mvninstall:install-file-Dfile="$(pwd)/flight-core-9.0.0.dev501.jar"-DgroupId=org.apache.arrow-DartifactId=flight-core-Dversion=9.0.0.dev501-Dpackaging=jar
  5. Validate that the packages were installed:

    $tree~/.m2/repository/org/apache/arrow.├──arrow-flight│├──9.0.0.dev501│└──arrow-flight-9.0.0.dev501.pom├──arrow-format│├──9.0.0.dev501│├──arrow-format-9.0.0.dev501.jar│└──arrow-format-9.0.0.dev501.pom├──arrow-java-root│├──9.0.0.dev501│└──arrow-java-root-9.0.0.dev501.pom├──arrow-memory│├──9.0.0.dev501│└──arrow-memory-9.0.0.dev501.pom├──arrow-memory-core│├──9.0.0.dev501│├──arrow-memory-core-9.0.0.dev501.jar│└──arrow-memory-core-9.0.0.dev501.pom├──arrow-memory-netty│├──9.0.0.dev501│├──arrow-memory-netty-9.0.0.dev501.jar│└──arrow-memory-netty-9.0.0.dev501.pom├──arrow-vector│├──9.0.0.dev501│├──_remote.repositories│├──arrow-vector-9.0.0.dev501.jar│└──arrow-vector-9.0.0.dev501.pom└──flight-core├──9.0.0.dev501├──flight-core-9.0.0.dev501.jar└──flight-core-9.0.0.dev501.pom
  6. Compile your project like usual withmvncleaninstall.

Installing Staging Packages#

Warning

These packages are not official releases. Use them at your own risk.

Arrow staging builds are created when a Release Candidate (RC) is being prepared. This allows users to test the RC in their applications before voting on the release.

Installing from Apache Staging#

  1. Look up the next version number for the Arrow libraries used.

  2. Add Apache Staging Repository to the Maven/Gradle project.

    <properties><arrow.version>9.0.0</arrow.version></properties>...<repositories><repository><id>arrow-apache-staging</id><url>https://repository.apache.org/content/repositories/staging</url></repository></repositories>...<dependencies><dependency><groupId>org.apache.arrow</groupId><artifactId>arrow-vector</artifactId><version>${arrow.version}</version></dependency></dependencies>...