ArrowFlight is a general-purpose client-server framework for highperformance transport of large datasets over network interfaces, builtas part of the Apache Arrow project. It allows for highly efficient datatransfer by several means:
- Flight removes the need for deserialization during datatransfer.
- Flight allows for parallel data streaming.
- Flight employs optimizations designed to take advantage of Arrow’scolumnar format.
The arrow package provides methods for connecting to Flight serversto send and receive data.
Prerequisites
At present the arrow package in R does not supply an independentimplementation of Arrow Flight: it works by callingFlightmethods supplied by PyArrow Python, and requires both thereticulate package andthe Python PyArrow library to be installed. If you are using them forthe first time you can install them like this:
install.packages("reticulate")arrow::install_pyarrow()See thepython integrations article formore details on setting up pyarrow.
Example
The package includes methods for starting a Python-based Flightserver, as well as methods for connecting to a Flight server runningelsewhere. To illustrate both sides, in one R process we’ll start a demoserver:
library(arrow)demo_server<-load_flight_server("demo_flight_server")server<-demo_server$DemoFlightServer(port=8089)server$serve()We’ll leave that one running.
In a different R process, let’s connect to it and put some data init.
library(arrow)client<-flight_connect(port=8089)flight_put(client,iris, path="test_data/iris")Now, in yet another R process, we can connect to the server and pullthe data we put there:
library(arrow)library(dplyr)client<-flight_connect(port=8089)client|>flight_get("test_data/iris")|>group_by(Species)|>summarize(max_petal=max(Petal.Length))## # A tibble: 3 x 2## Species max_petal## <fct> <dbl>## 1 setosa 1.9## 2 versicolor 5.1## 3 virginica 6.9Becauseflight_get() returns an Arrow data structure,you can directly pipe its result into adplyr workflow. See the articleondata wrangling for moreinformation on working with Arrow objects via a dplyr interface.
Further reading
- The specification of theFlight remoteprocedure call protocol is listed on the Arrow project homepage
- The Arrow C++ documentation contains a list ofbestpractices for Arrow Flight.
- A detailed worked example of an Arrow Flight server in Python isprovided in theApache ArrowPython Cookbook.