Movatterモバイル変換


[0]ホーム

URL:


Skip to contents

Configuring a developer environment

Source:vignettes/developers/setup.Rmd
setup.Rmd

The Arrow R package is unique compared to other R packages that youmay have contributed to because it builds on top of the large andfeature-rich Arrow C++ implementation. Because the R package integratestightly with Arrow C++, it typically requires a dedicated copy of thelibrary (i.e., it is usually not possible to link to a system version oflibarrow during development).

Option 1: Using nightly libarrow binaries

On Linux, macOS, and Windows you can use the same workflow you mightuse for another package that contains compiled code (e.g.,R CMD INSTALL . from a terminal,devtools::load_all() from an R prompt, orInstall & Restart from RStudio). If thearrow/r/libarrow directory is not populated, the configurescript will attempt to download the latest nightly libarrow binary,extract it to thearrow/r/libarrow directory (macOS, Linux)orarrow/r/windows directory (Windows), and continuebuilding the R package as usual.

Most of the time, you won’t need to update your version of libarrowbecause the R package rarely changes with updates to the C++ library;however, if you start to get errors when rebuilding the R package, youmay have to remove thelibarrow directory (macOS, Linux) orwindows directory (Windows) and do a “clean” rebuild. Youcan do this from a terminal withR CMD INSTALL . --preclean, from RStudio using the “Cleanand Install” option from “Build” tab, or usingmake cleanif you are using theMakefile located in the root of the Rpackage.

Option 2: Use a local Arrow C++ development build

If you need to alter both libarrow and the R package code, or if youcan’t get a binary version of the latest libarrow elsewhere, you’ll needto build it from source. This section discusses how to set up a C++libarrow build configured to work with the R package. For more generalresources, see theArrowC++ developer guide.

There are five major steps to the process.

Step 1 - Install dependencies

When building libarrow, by default, system dependencies will be usedif suitable versions are found. If system dependencies are not present,libarrow will build them during its own build process. The onlydependencies that you need to installoutside of the buildprocess arecmake (for configuring thebuild) andopenssl if you arebuilding with S3 support.

For a faster build, you may choose to pre-install more C++ librarydependencies (such aslz4,zstd, etc.) on the system sothat they don’t need to be built from source in the libarrow build.

Ubuntu
sudo apt install-y cmake libcurl4-openssl-dev libssl-dev
macOS
brew install cmake openssl

Step 2 - Configure the libarrow build

We recommend that you configure libarrow to be built to a user-leveldirectory rather than a system directory for your development work. Thisis so that the development version you are using doesn’t overwrite areleased version of libarrow you may already have installed, and so thatyou are also able work with more than one version of libarrow (by usingdifferentARROW_HOME directories for the differentversions).

In the example below, libarrow is installed to a directory calleddist that has the same parent directory as the arrowcheckout. Your installation of the Arrow R package can point to anydirectory with any name, though we recommendnot placing itinside of the arrow git checkout directory as unwanted changes couldstop it working properly.

exportARROW_HOME=$(pwd)/distmkdir$ARROW_HOME

Special instructions on Linux: You will need to setLD_LIBRARY_PATH to thelib directory that isunder where you set$ARROW_HOME, before launching R andusing arrow. One way to do this is to add it to your profile (we use~/.bash_profile here, but you might need to put this in adifferent file depending on your setup, e.g. if you use a shell otherthanbash). On macOS you do not need to do this because themacOS shared library paths are hardcoded to their locations during buildtime.

exportLD_LIBRARY_PATH=$ARROW_HOME/lib:$LD_LIBRARY_PATHecho"export LD_LIBRARY_PATH=$ARROW_HOME/lib:$LD_LIBRARY_PATH">> ~/.bash_profile

Start by navigating in a terminal to the arrow repository. You willneed to create a directory into which the C++ build will put itscontents. We recommend that you make abuild directoryinside of thecpp directory of the Arrow git repository (itis git-ignored, so you won’t accidentally check it in). Next, changedirectories to be insidecpp/build:

pushd arrowmkdir-p cpp/buildpushd cpp/build

You’ll first callcmake to configure the build and thenmake install. For the R package, you’ll need to enableseveral features in libarrow using-D flags:

Linux / Mac OS
cmake\-DCMAKE_INSTALL_PREFIX=$ARROW_HOME\-DCMAKE_INSTALL_LIBDIR=lib\-DARROW_COMPUTE=ON\-DARROW_CSV=ON\-DARROW_DATASET=ON\-DARROW_EXTRA_ERROR_CONTEXT=ON\-DARROW_FILESYSTEM=ON\-DARROW_INSTALL_NAME_RPATH=OFF\-DARROW_JEMALLOC=ON\-DARROW_JSON=ON\-DARROW_PARQUET=ON\-DARROW_WITH_SNAPPY=ON\-DARROW_WITH_ZLIB=ON\  ..

.. refers to the C++ source directory: you’re incpp/build and the source is incpp.

Enabling more Arrow features

To enable optional features including: S3 support, an alternativememory allocator, and additional compression libraries, add some or allof these flags to your call tocmake (the trailing\ makes them easier to paste into a bash shell on a newline):

-DARROW_GCS=ON\-DARROW_MIMALLOC=ON\-DARROW_S3=ON\-DARROW_WITH_BROTLI=ON\-DARROW_WITH_BZ2=ON\-DARROW_WITH_LZ4=ON\-DARROW_WITH_SNAPPY=ON\-DARROW_WITH_ZSTD=ON\

Other flags that may be useful:

  • -DBoost_SOURCE=BUNDLED and-DThrift_SOURCE=BUNDLED, for example, or any otherdependency*_SOURCE, if you have a system version of a C++dependency that doesn’t work correctly with Arrow. This tells the buildto compile its own version of the dependency from source.

  • -DCMAKE_BUILD_TYPE=debug or-DCMAKE_BUILD_TYPE=relwithdebinfo can be useful fordebugging. You probably don’t want to do this generally because a debugbuild is much slower at runtime than the defaultreleasebuild.

  • -DARROW_BUILD_STATIC=ON and-DARROW_BUILD_SHARED=OFF if you want to use staticlibraries instead of dynamic libraries. With static libraries thereisn’t a risk of the R package linking to the wrong library, but it doesmean if you change the C++ code you have to recompile both the C++libraries and the R package. Compilers typically will link to staticlibraries only if the dynamic ones are not present, which is why we needto set-DARROW_BUILD_SHARED=OFF. If you are switching aftercompiling and installing previously, you may need to remove the.dll or.so files from$ARROW_HOME/dist/bin.

Notecmake is particularly sensitive towhitespacing, if you see errors, check that you don’t have any errantwhitespace.

Step 3 - Building libarrow

You can add-j# at the end of the command here too tospeed up compilation by running in parallel (where# is thenumber of cores you have available).

cmake--build .--target install-j8

Step 4 - Build the Arrow R package

Once you’ve built libarrow, you can install the R package and itsdependencies, along with additional dev dependencies, from the gitcheckout like below. You might need to either pick and set a repositoryinteractively or you could add a repository to theinstall.packages() command withrepos="https://cloud.r-project.org".

popd# To go back to the root directory of the project, from cpp/buildpushd rR-e'install.packages("remotes"); remotes::install_deps(dependencies = TRUE)'R CMD INSTALL--no-multiarch .

The--no-multiarch flag makes it only compile on the“main” architecture. This will compile for the architecture that the Rin your path corresponds to. If you compile on one architecture and thenswitch to another, make sure to pass the--preclean flag sothat the R package code is recompiled for the new architecture.Otherwise, you may see errors likeLoadLibrary failure: %1 is not a valid Win32 application.

Compilation flags

If you need to set any compilation flags while building the C++extensions, you can use theARROW_R_CXXFLAGS environmentvariable. For example, if you are usingperf to profile theR extensions, you may need to set

exportARROW_R_CXXFLAGS=-fno-omit-frame-pointer
Recompiling the C++ code

With the setup described here, you should not need to rebuild theArrow library or even the C++ source in the R package as you iterate andwork on the R package. The only time those should need to be rebuilt isif you have changed the C++ in the R package (and even then,R CMD INSTALL . should only need to recompile the filesthat have changed)or if the libarrow C++ has changed and thereis a mismatch between libarrow and the R package. If you find yourselfrebuilding either or both each time you install the package or runtests, something is probably wrong with your set up.

For a full build: acmake command with all of theR-relevant optional dependencies turned on. Development with otherlanguages might require different flags as well. For example, to developPython, you would need to also add-DARROW_PYTHON=ON(though all of the other flags used for Python are already includedhere).

cmake\-DCMAKE_INSTALL_PREFIX=$ARROW_HOME\-DCMAKE_INSTALL_LIBDIR=lib\-DARROW_COMPUTE=ON\-DARROW_CSV=ON\-DARROW_DATASET=ON\-DARROW_EXTRA_ERROR_CONTEXT=ON\-DARROW_FILESYSTEM=ON\-DARROW_GCS=ON\-DARROW_INSTALL_NAME_RPATH=OFF\-DARROW_JEMALLOC=ON\-DARROW_JSON=ON\-DARROW_MIMALLOC=ON\-DARROW_PARQUET=ON\-DARROW_S3=ON\-DARROW_WITH_BROTLI=ON\-DARROW_WITH_BZ2=ON\-DARROW_WITH_LZ4=ON\-DARROW_WITH_SNAPPY=ON\-DARROW_WITH_ZLIB=ON\-DARROW_WITH_ZSTD=ON\  ..

Installing a version of the R package with a specific gitreference

If you need an arrow installation from a specific repository or gitreference, on most platforms except Windows, you can run:

remotes::install_github("apache/arrow/r", build=FALSE)

Thebuild = FALSE argument is important so that theinstallation can access the C++ source in thecpp/directory inapache/arrow.

As with other installation methods, setting the environment variablesLIBARROW_MINIMAL=false andARROW_R_DEV=truewill provide a more full-featured version of Arrow and provide moreverbose output, respectively.

For example, to install from the (fictional) branchbugfix fromapache/arrow you could run:

Sys.setenv(LIBARROW_MINIMAL="false")remotes::install_github("apache/arrow/r@bugfix", build=FALSE)

Developers may wish to use this method of installing a specificcommit separate from another Arrow development environment or systeminstallation (e.g. we use this inarrowbench toinstall development versions of libarrow isolated from the systeminstall). If you already have libarrow installed system-wide, you mayneed to set some additional variables in order to isolate this buildfrom your system libraries:

  • Setting the environment variableFORCE_BUNDLED_BUILDtotrue will skip thepkg-config search forlibarrow and attempt to build from the same source at the repository+refgiven.

  • You may also need to set the MakevarsCPPFLAGS andLDFLAGS to"" in order to prevent theinstallation process from attempting to link to already installed systemversions of libarrow. One way to do this temporarily is wrapping yourremotes::install_github() call like so:

withr::with_makevars(list(CPPFLAGS="", LDFLAGS=""),remotes::install_github(...))

Summary of environment variables

  • See the user-facingarticle oninstallation for a large number of environment variables thatdetermine how the build works and what features get built.
  • ARROW_OFFLINE_BUILD: When set totrue, thebuild script will not download prebuilt the C++ library binary or, ifneeded,cmake. It will turn off any features that require adownload, unless they’re available inARROW_THIRDPARTY_DEPENDENCY_DIR or thetools/thirdparty_download/ subfolder.create_package_with_all_dependencies() creates thatsubfolder.

Troubleshooting

Note that after any change to libarrow, you must reinstall it and runmake clean orgit clean -fdx . to remove anycached object code in ther/src/ directory beforereinstalling the R package. This is only necessary if you make changesto libarrow source; you do not need to manually purge object files ifyou are only editing R or C++ code insider/.

Arrow library - R package mismatches

If libarrow and the R package have diverged, you will see errorslike:

Error: package or namespace load failed for ‘arrow' in dyn.load(file, DLLpath = DLLpath, ...): unable to load shared object '/Library/Frameworks/R.framework/Versions/4.0/Resources/library/00LOCK-r/00new/arrow/libs/arrow.so':  dlopen(/Library/Frameworks/R.framework/Versions/4.0/Resources/library/00LOCK-r/00new/arrow/libs/arrow.so, 6): Symbol not found: __ZN5arrow2io16RandomAccessFile9ReadAsyncERKNS0_9IOContextExx  Referenced from: /Library/Frameworks/R.framework/Versions/4.0/Resources/library/00LOCK-r/00new/arrow/libs/arrow.so  Expected in: flat namespace in /Library/Frameworks/R.framework/Versions/4.0/Resources/library/00LOCK-r/00new/arrow/libs/arrow.soError: loading failedExecution haltedERROR: loading failed

To resolve this, tryrebuilding theArrow library.

Multiple versions of libarrow

If you are installing from a user-level directory, and you alreadyhave a previous installation of libarrow in a system directory, you getyou may get errors like the following when you install the Rpackage:

Error: package or namespace load failed for ‘arrow' in dyn.load(file, DLLpath = DLLpath, ...): unable to load shared object '/Library/Frameworks/R.framework/Versions/4.0/Resources/library/00LOCK-r/00new/arrow/libs/arrow.so':  dlopen(/Library/Frameworks/R.framework/Versions/4.0/Resources/library/00LOCK-r/00new/arrow/libs/arrow.so, 6): Library not loaded: /usr/local/lib/libarrow.400.dylib  Referenced from: /usr/local/lib/libparquet.400.dylib  Reason: image not found

If this happens, you need to make sure that you don’t let R link toyour system library when building arrow. You can do this a number ofdifferent ways:

  • Setting theMAKEFLAGS environment variable to"LDFLAGS=" (see below for an example) this is therecommended way to accomplish this
  • Using {withr}’swith_makevars(list(LDFLAGS = ""), ...)
  • addingLDFLAGS= to your~/.R/Makevars file(the least recommended way, though it is a common debugging approachsuggested online)
MAKEFLAGS="LDFLAGS="R CMD INSTALL .

rpath issues

If the package fails to install/load with an error like this:

  ** testing if installed package can be loaded from temporary location  Error: package or namespace load failed for 'arrow' in dyn.load(file, DLLpath = DLLpath, ...):  unable to load shared object '/Users/you/R/00LOCK-r/00new/arrow/libs/arrow.so':  dlopen(/Users/you/R/00LOCK-r/00new/arrow/libs/arrow.so, 6): Library not loaded: @rpath/libarrow.14.dylib

ensure that-DARROW_INSTALL_NAME_RPATH=OFF was passed(this is important on macOS to prevent problems at link time and is ano-op on other platforms). Alternatively, try setting the environmentvariableR_LD_LIBRARY_PATH to wherever Arrow C++ was put inmake install,e.g. export R_LD_LIBRARY_PATH=/usr/local/lib, and retryinstalling the R package.

When installing from source, if the R and C++ library versions do notmatch, installation may fail. If you’ve previously installed thelibraries and want to upgrade the R package, you’ll need to update theArrow C++ library first.

For any other build/configuration challenges, see theC++developer guide.

Other installation issues

There are a number of scripts that are triggered when the arrow Rpackage is installed. For package users who are not interacting with theunderlying code, these should all just work without configuration andpull in the most complete pieces (e.g. official binaries that we host).However, knowing about these scripts can help package developerstroubleshoot if things go wrong in them or things go wrong in aninstall. Seethe article on R packageinstallation for more information.


[8]ページ先頭

©2009-2026 Movatter.jp