Building Arrow Java#
System Setup#
Arrow Java uses theMaven build system.
Building requires:
JDK 11+
Maven 3+
Note
CI will test all supported JDK LTS versions, plus the latest non-LTS version.
Building#
All the instructions below assume that you have cloned the Arrow gitrepository:
$gitclonehttps://github.com/apache/arrow.git$cdarrow$gitsubmoduleupdate--init--recursive
These are the options available to compile Arrow Java modules with:
Maven build tool.
Docker Compose.
Archery.
Building Java Modules#
To build the default modules, go to the project root and execute:
Maven#
$ cd arrow/java$ export JAVA_HOME=<absolute path to your java home>$ java --version$ mvn clean install
Docker compose#
$ cd arrow/java$ export JAVA_HOME=<absolute path to your java home>$ java --version$ docker compose run java
Archery#
$ cd arrow/java$ export JAVA_HOME=<absolute path to your java home>$ java --version$ archery docker run java
Building JNI Libraries (*.dylib / *.so / *.dll)#
First, we need to build theC++ shared libraries that the JNI bindings will use.We can build these manually or we can useArchery to build them using a Docker container(This will require installing Docker, Docker Compose, and Archery).
Note
If you are building on Apple Silicon, be sure to use a JDK version that was compiledfor that architecture. See, for example, theAzul JDK.
If you are building on Windows OS, seeDeveloping on Windows.
Maven#
To build only the JNI C Data Interface library (macOS / Linux):
$ cd arrow/java$ export JAVA_HOME=<absolute path to your java home>$ java --version$ mvn generate-resources -Pgenerate-libs-cdata-all-os -N$ ls -latr ../java-dist/lib|__ arrow_cdata_jni/
To build only the JNI C Data Interface library (Windows):
$ cd arrow/java$ mvn generate-resources -Pgenerate-libs-cdata-all-os -N$ dir "../java-dist/bin"|__ arrow_cdata_jni/
To build all JNI libraries (macOS / Linux) except the JNI C Data Interface library:
$ cd arrow/java$ export JAVA_HOME=<absolute path to your java home>$ java --version$ mvn generate-resources -Pgenerate-libs-jni-macos-linux -N$ ls -latr java-dist/lib|__ arrow_dataset_jni/|__ arrow_orc_jni/|__ gandiva_jni/
To build all JNI libraries (Windows) except the JNI C Data Interface library:
$ cd arrow/java$ mvn generate-resources -Pgenerate-libs-jni-windows -N$ dir "../java-dist/bin"|__ arrow_dataset_jni/
CMake#
To build only the JNI C Data Interface library (macOS / Linux):
$ cd arrow$ mkdir -p java-dist java-cdata$ cmake \ -S java \ -B java-cdata \ -DARROW_JAVA_JNI_ENABLE_C=ON \ -DARROW_JAVA_JNI_ENABLE_DEFAULT=OFF \ -DBUILD_TESTING=OFF \ -DCMAKE_BUILD_TYPE=Release \ -DCMAKE_INSTALL_PREFIX=java-dist$ cmake --build java-cdata --target install --config Release$ ls -latr java-dist/lib|__ arrow_cdata_jni/
To build only the JNI C Data Interface library (Windows):
$ cd arrow$ mkdir java-dist, java-cdata$ cmake ^ -S java ^ -B java-cdata ^ -DARROW_JAVA_JNI_ENABLE_C=ON ^ -DARROW_JAVA_JNI_ENABLE_DEFAULT=OFF ^ -DBUILD_TESTING=OFF ^ -DCMAKE_BUILD_TYPE=Release ^ -DCMAKE_INSTALL_PREFIX=java-dist$ cmake --build java-cdata --target install --config Release$ dir "java-dist/bin"|__ arrow_cdata_jni/
To build all JNI libraries (macOS / Linux) except the JNI C Data Interface library:
$ cd arrow$ brew bundle --file=cpp/Brewfile# Homebrew Bundle complete! 25 Brewfile dependencies now installed.$ brew uninstall aws-sdk-cpp# (We can't use aws-sdk-cpp installed by Homebrew because it has# an issue: https://github.com/aws/aws-sdk-cpp/issues/1809 )$ export JAVA_HOME=<absolute path to your java home>$ mkdir -p java-dist cpp-jni$ cmake \ -S cpp \ -B cpp-jni \ -DARROW_BUILD_SHARED=OFF \ -DARROW_CSV=ON \ -DARROW_DATASET=ON \ -DARROW_DEPENDENCY_SOURCE=BUNDLED \ -DARROW_DEPENDENCY_USE_SHARED=OFF \ -DARROW_FILESYSTEM=ON \ -DARROW_GANDIVA=ON \ -DARROW_GANDIVA_STATIC_LIBSTDCPP=ON \ -DARROW_JSON=ON \ -DARROW_ORC=ON \ -DARROW_PARQUET=ON \ -DARROW_S3=ON \ -DARROW_SUBSTRAIT=ON \ -DARROW_USE_CCACHE=ON \ -DCMAKE_BUILD_TYPE=Release \ -DCMAKE_INSTALL_PREFIX=java-dist \ -DCMAKE_UNITY_BUILD=ON$ cmake --build cpp-jni --target install --config Release$ cmake \ -S java \ -B java-jni \ -DARROW_JAVA_JNI_ENABLE_C=OFF \ -DARROW_JAVA_JNI_ENABLE_DEFAULT=ON \ -DBUILD_TESTING=OFF \ -DCMAKE_BUILD_TYPE=Release \ -DCMAKE_INSTALL_PREFIX=java-dist \ -DCMAKE_PREFIX_PATH=$PWD/java-dist \ -DProtobuf_ROOT=$PWD/../cpp-jni/protobuf_ep-install \ -DProtobuf_USE_STATIC_LIBS=ON$ cmake --build java-jni --target install --config Release$ ls -latr java-dist/lib/|__ arrow_dataset_jni/|__ arrow_orc_jni/|__ gandiva_jni/
To build all JNI libraries (Windows) except the JNI C Data Interface library:
$ cd arrow$ mkdir java-dist, cpp-jni$ cmake ^ -S cpp ^ -B cpp-jni ^ -DARROW_BUILD_SHARED=OFF ^ -DARROW_CSV=ON ^ -DARROW_DATASET=ON ^ -DARROW_DEPENDENCY_USE_SHARED=OFF ^ -DARROW_FILESYSTEM=ON ^ -DARROW_GANDIVA=OFF ^ -DARROW_JSON=ON ^ -DARROW_ORC=ON ^ -DARROW_PARQUET=ON ^ -DARROW_S3=ON ^ -DARROW_SUBSTRAIT=ON ^ -DARROW_USE_CCACHE=ON ^ -DARROW_WITH_BROTLI=ON ^ -DARROW_WITH_LZ4=ON ^ -DARROW_WITH_SNAPPY=ON ^ -DARROW_WITH_ZLIB=ON ^ -DARROW_WITH_ZSTD=ON ^ -DCMAKE_BUILD_TYPE=Release ^ -DCMAKE_INSTALL_PREFIX=java-dist ^ -DCMAKE_UNITY_BUILD=ON ^ -GNinja$ cd cpp-jni$ ninja install$ cd ../$ cmake ^ -S java ^ -B java-jni ^ -DARROW_JAVA_JNI_ENABLE_C=OFF ^ -DARROW_JAVA_JNI_ENABLE_DATASET=ON ^ -DARROW_JAVA_JNI_ENABLE_DEFAULT=ON ^ -DARROW_JAVA_JNI_ENABLE_GANDIVA=OFF ^ -DARROW_JAVA_JNI_ENABLE_ORC=ON ^ -DBUILD_TESTING=OFF ^ -DCMAKE_BUILD_TYPE=Release ^ -DCMAKE_INSTALL_PREFIX=java-dist ^ -DCMAKE_PREFIX_PATH=$PWD/java-dist$ cmake --build java-jni --target install --config Release$ dir "java-dist/bin"|__ arrow_orc_jni/|__ arrow_dataset_jni/
Archery#
$ cd arrow$ archery docker run java-jni-manylinux-2014$ ls -latr java-dist|__ arrow_cdata_jni/|__ arrow_dataset_jni/|__ arrow_orc_jni/|__ gandiva_jni/
Building Java JNI Modules#
To compile the JNI bindings, use the
arrow-c-dataMaven profile:$ cd arrow/java$ mvn -Darrow.c.jni.dist.dir=<absolute path to your arrow folder>/java-dist/lib -Parrow-c-data clean install
To compile the JNI bindings for ORC / Gandiva / Dataset, use the
arrow-jniMaven profile:$ cd arrow/java$ mvn \ -Darrow.cpp.build.dir=<absolute path to your arrow folder>/java-dist/lib/ \ -Darrow.c.jni.dist.dir=<absolute path to your arrow folder>/java-dist/lib/ \ -Parrow-jni clean install
Testing#
By default, Maven uses the same Java version to both build the code and run the tests.
It is also possible to use a different JDK version for the tests. This requires Maventoolchains to be configured beforehand, and then a specific test property needs to be set.
Configuring Maven toolchains#
To be able to use a JDK version for testing, it needs to be registered first in Maventoolchains.xmlconfiguration file usually located under${HOME}/.m2 with the following snippet added to it:
<?xml version="1.0" encoding="UTF8"?><toolchains>[...]<toolchain><type>jdk</type><provides><version>21</version><!-- Replace with the corresponding JDK version: 11, 17, ... --><vendor>temurin</vendor><!-- Replace with the vendor/distribution: temurin, oracle, zulu ... --></provides><configuration><jdkHome>path/to/jdk/home</jdkHome><!-- Replace with the path to the JDK --></configuration></toolchain>[...]</toolchains>
Testing with a specific JDK#
To run Arrow tests with a specific JDK version, use thearrow.test.jdk-version property.
For example, to run Arrow tests with JDK 17, use the following snippet:
$ cd arrow/java$ mvn -Darrow.test.jdk-version=17 clean verify
IDE Configuration#
IntelliJ#
To start working on Arrow in IntelliJ: build the project once from the commandline usingmvncleaninstall. Then open thejava/ subdirectory of theArrow repository, and update the following settings:
In the Files tool window, find the path
vector/target/generated-sources,right click the directory, and select Mark Directory as > Generated SourcesRoot. There is no need to mark other generated sources directories, as onlythevectormodule generates sources.For JDK 11, due to anIntelliJ bug, you must go intoSettings > Build, Execution, Deployment > Compiler > Java Compiler and disable“Use ‘–release’ option for cross-compilation (Java 9 and later)”. Otherwiseyou will get an error like “package sun.misc does not exist”.
You may want to disable error-prone entirely if it gives spuriouswarnings (disable both error-prone profiles in the Maven tool windowand “Reload All Maven Projects”).
If using IntelliJ’s Maven integration to build, you may need to change
<fork>tofalsein the pom.xml files due to anIntelliJ bug.To enable debugging JNI-based modules like
dataset,activate specific profiles in the Maven tab under “Profiles”.Ensure the profilesarrow-c-data,arrow-jni,generate-libs-cdata-all-os,generate-libs-jni-macos-linux, andjdk11+are enabled, so that theIDE can build them and enable debugging.
You may not need to update all of these settings if you build/test with theIntelliJ Maven integration instead of with IntelliJ directly.
Common Errors#
When working with the JNI code: if the C++ build cannot find dependencies, with errors like these:
Could NOT find Boost (missing: Boost_INCLUDE_DIR system filesystem)Could NOT find Lz4 (missing: LZ4_LIB)Could NOT find zstd (missing: ZSTD_LIB)
Specify that the dependencies should be downloaded at build time (more details atDependency Resolution):
-Dre2_SOURCE=BUNDLED \-DBoost_SOURCE=BUNDLED \-Dutf8proc_SOURCE=BUNDLED \-DSnappy_SOURCE=BUNDLED \-DORC_SOURCE=BUNDLED \-DZLIB_SOURCE=BUNDLED
Installing Nightly Packages#
Warning
These packages are not official releases. Use them at your own risk.
Arrow nightly builds are posted on the mailing list atbuilds@arrow.apache.org.The artifacts are uploaded to GitHub. For example, for 2022/07/30, they can be found atGitHub Nightly.
Installing from Apache Nightlies#
Look up the nightly version number for the Arrow libraries used.
For example, for
arrow-memory, visithttps://nightlies.apache.org/arrow/java/org/apache/arrow/arrow-memory/ and see what versions are available (e.g. 9.0.0.dev501).Add Apache Nightlies Repository to the Maven/Gradle project.
<properties><arrow.version>9.0.0.dev501</arrow.version></properties>...<repositories><repository><id>arrow-apache-nightlies</id><url>https://nightlies.apache.org/arrow/java</url></repository></repositories>...<dependencies><dependency><groupId>org.apache.arrow</groupId><artifactId>arrow-vector</artifactId><version>${arrow.version}</version></dependency></dependencies>...
Installing Manually#
Decide nightly packages repository to use, for example:ursacomputing/crossbow
Add packages to your pom.xml, for example: flight-core (it depends on: arrow-format, arrow-vector, arrow-memory-core and arrow-memory-netty).
<properties><maven.compiler.source>8</maven.compiler.source><maven.compiler.target>8</maven.compiler.target><arrow.version>9.0.0.dev501</arrow.version></properties><dependencies><dependency><groupId>org.apache.arrow</groupId><artifactId>flight-core</artifactId><version>${arrow.version}</version></dependency></dependencies>
Download the necessary pom and jar files to a temporary directory:
$mkdirnightly-packaging-2022-07-30-0-github-java-jars$cdnightly-packaging-2022-07-30-0-github-java-jars$wgethttps://github.com/ursacomputing/crossbow/releases/download/nightly-packaging-2022-07-30-0-github-java-jars/arrow-java-root-9.0.0.dev501.pom$wgethttps://github.com/ursacomputing/crossbow/releases/download/nightly-packaging-2022-07-30-0-github-java-jars/arrow-format-9.0.0.dev501.pom$wgethttps://github.com/ursacomputing/crossbow/releases/download/nightly-packaging-2022-07-30-0-github-java-jars/arrow-format-9.0.0.dev501.jar$wgethttps://github.com/ursacomputing/crossbow/releases/download/nightly-packaging-2022-07-30-0-github-java-jars/arrow-vector-9.0.0.dev501.pom$wgethttps://github.com/ursacomputing/crossbow/releases/download/nightly-packaging-2022-07-30-0-github-java-jars/arrow-vector-9.0.0.dev501.jar$wgethttps://github.com/ursacomputing/crossbow/releases/download/nightly-packaging-2022-07-30-0-github-java-jars/arrow-memory-9.0.0.dev501.pom$wgethttps://github.com/ursacomputing/crossbow/releases/download/nightly-packaging-2022-07-30-0-github-java-jars/arrow-memory-core-9.0.0.dev501.pom$wgethttps://github.com/ursacomputing/crossbow/releases/download/nightly-packaging-2022-07-30-0-github-java-jars/arrow-memory-netty-9.0.0.dev501.pom$wgethttps://github.com/ursacomputing/crossbow/releases/download/nightly-packaging-2022-07-30-0-github-java-jars/arrow-memory-core-9.0.0.dev501.jar$wgethttps://github.com/ursacomputing/crossbow/releases/download/nightly-packaging-2022-07-30-0-github-java-jars/arrow-memory-netty-9.0.0.dev501.jar$wgethttps://github.com/ursacomputing/crossbow/releases/download/nightly-packaging-2022-07-30-0-github-java-jars/arrow-flight-9.0.0.dev501.pom$wgethttps://github.com/ursacomputing/crossbow/releases/download/nightly-packaging-2022-07-30-0-github-java-jars/flight-core-9.0.0.dev501.pom$wgethttps://github.com/ursacomputing/crossbow/releases/download/nightly-packaging-2022-07-30-0-github-java-jars/flight-core-9.0.0.dev501.jar$tree.├──arrow-flight-9.0.0.dev501.pom├──arrow-format-9.0.0.dev501.jar├──arrow-format-9.0.0.dev501.pom├──arrow-java-root-9.0.0.dev501.pom├──arrow-memory-9.0.0.dev501.pom├──arrow-memory-core-9.0.0.dev501.jar├──arrow-memory-core-9.0.0.dev501.pom├──arrow-memory-netty-9.0.0.dev501.jar├──arrow-memory-netty-9.0.0.dev501.pom├──arrow-vector-9.0.0.dev501.jar├──arrow-vector-9.0.0.dev501.pom├──flight-core-9.0.0.dev501.jar└──flight-core-9.0.0.dev501.pomInstall the artifacts to the local Maven repository with
mvninstall:install-file:$mvninstall:install-file-Dfile="$(pwd)/arrow-java-root-9.0.0.dev501.pom"-DgroupId=org.apache.arrow-DartifactId=arrow-java-root-Dversion=9.0.0.dev501-Dpackaging=pom$mvninstall:install-file-Dfile="$(pwd)/arrow-format-9.0.0.dev501.pom"-DgroupId=org.apache.arrow-DartifactId=arrow-format-Dversion=9.0.0.dev501-Dpackaging=pom$mvninstall:install-file-Dfile="$(pwd)/arrow-format-9.0.0.dev501.jar"-DgroupId=org.apache.arrow-DartifactId=arrow-format-Dversion=9.0.0.dev501-Dpackaging=jar$mvninstall:install-file-Dfile="$(pwd)/arrow-vector-9.0.0.dev501.pom"-DgroupId=org.apache.arrow-DartifactId=arrow-vector-Dversion=9.0.0.dev501-Dpackaging=pom$mvninstall:install-file-Dfile="$(pwd)/arrow-vector-9.0.0.dev501.jar"-DgroupId=org.apache.arrow-DartifactId=arrow-vector-Dversion=9.0.0.dev501-Dpackaging=jar$mvninstall:install-file-Dfile="$(pwd)/arrow-memory-9.0.0.dev501.pom"-DgroupId=org.apache.arrow-DartifactId=arrow-memory-Dversion=9.0.0.dev501-Dpackaging=pom$mvninstall:install-file-Dfile="$(pwd)/arrow-memory-core-9.0.0.dev501.pom"-DgroupId=org.apache.arrow-DartifactId=arrow-memory-core-Dversion=9.0.0.dev501-Dpackaging=pom$mvninstall:install-file-Dfile="$(pwd)/arrow-memory-netty-9.0.0.dev501.pom"-DgroupId=org.apache.arrow-DartifactId=arrow-memory-netty-Dversion=9.0.0.dev501-Dpackaging=pom$mvninstall:install-file-Dfile="$(pwd)/arrow-memory-core-9.0.0.dev501.jar"-DgroupId=org.apache.arrow-DartifactId=arrow-memory-core-Dversion=9.0.0.dev501-Dpackaging=jar$mvninstall:install-file-Dfile="$(pwd)/arrow-memory-netty-9.0.0.dev501.jar"-DgroupId=org.apache.arrow-DartifactId=arrow-memory-netty-Dversion=9.0.0.dev501-Dpackaging=jar$mvninstall:install-file-Dfile="$(pwd)/arrow-flight-9.0.0.dev501.pom"-DgroupId=org.apache.arrow-DartifactId=arrow-flight-Dversion=9.0.0.dev501-Dpackaging=pom$mvninstall:install-file-Dfile="$(pwd)/flight-core-9.0.0.dev501.pom"-DgroupId=org.apache.arrow-DartifactId=flight-core-Dversion=9.0.0.dev501-Dpackaging=pom$mvninstall:install-file-Dfile="$(pwd)/flight-core-9.0.0.dev501.jar"-DgroupId=org.apache.arrow-DartifactId=flight-core-Dversion=9.0.0.dev501-Dpackaging=jar
Validate that the packages were installed:
$tree~/.m2/repository/org/apache/arrow.├──arrow-flight│├──9.0.0.dev501││└──arrow-flight-9.0.0.dev501.pom├──arrow-format│├──9.0.0.dev501││├──arrow-format-9.0.0.dev501.jar││└──arrow-format-9.0.0.dev501.pom├──arrow-java-root│├──9.0.0.dev501││└──arrow-java-root-9.0.0.dev501.pom├──arrow-memory│├──9.0.0.dev501││└──arrow-memory-9.0.0.dev501.pom├──arrow-memory-core│├──9.0.0.dev501││├──arrow-memory-core-9.0.0.dev501.jar││└──arrow-memory-core-9.0.0.dev501.pom├──arrow-memory-netty│├──9.0.0.dev501││├──arrow-memory-netty-9.0.0.dev501.jar││└──arrow-memory-netty-9.0.0.dev501.pom├──arrow-vector│├──9.0.0.dev501││├──_remote.repositories││├──arrow-vector-9.0.0.dev501.jar││└──arrow-vector-9.0.0.dev501.pom└──flight-core├──9.0.0.dev501│├──flight-core-9.0.0.dev501.jar│└──flight-core-9.0.0.dev501.pom
Compile your project like usual with
mvncleaninstall.
Installing Staging Packages#
Warning
These packages are not official releases. Use them at your own risk.
Arrow staging builds are created when a Release Candidate (RC) is being prepared. This allows users to test the RC in their applications before voting on the release.
Installing from Apache Staging#
Look up the next version number for the Arrow libraries used.
Add Apache Staging Repository to the Maven/Gradle project.
<properties><arrow.version>9.0.0</arrow.version></properties>...<repositories><repository><id>arrow-apache-staging</id><url>https://repository.apache.org/content/repositories/staging</url></repository></repositories>...<dependencies><dependency><groupId>org.apache.arrow</groupId><artifactId>arrow-vector</artifactId><version>${arrow.version}</version></dependency></dependencies>...

