Movatterモバイル変換


[0]ホーム

URL:


Skip to contents

Connecting to a Flight server

Source:vignettes/flight.Rmd
flight.Rmd

ArrowFlight is a general-purpose client-server framework for highperformance transport of large datasets over network interfaces, builtas part of the Apache Arrow project. It allows for highly efficient datatransfer by several means:

  • Flight removes the need for deserialization during datatransfer.
  • Flight allows for parallel data streaming.
  • Flight employs optimizations designed to take advantage of Arrow’scolumnar format.

The arrow package provides methods for connecting to Flight serversto send and receive data.

Prerequisites

At present the arrow package in R does not supply an independentimplementation of Arrow Flight: it works by callingFlightmethods supplied by PyArrow Python, and requires both thereticulate package andthe Python PyArrow library to be installed. If you are using them forthe first time you can install them like this:

install.packages("reticulate")arrow::install_pyarrow()

See thepython integrations article formore details on setting up pyarrow.

Example

The package includes methods for starting a Python-based Flightserver, as well as methods for connecting to a Flight server runningelsewhere. To illustrate both sides, in one R process we’ll start a demoserver:

library(arrow)demo_server<-load_flight_server("demo_flight_server")server<-demo_server$DemoFlightServer(port=8089)server$serve()

We’ll leave that one running.

In a different R process, let’s connect to it and put some data init.

library(arrow)client<-flight_connect(port=8089)flight_put(client,iris, path="test_data/iris")

Now, in yet another R process, we can connect to the server and pullthe data we put there:

library(arrow)library(dplyr)client<-flight_connect(port=8089)client|>flight_get("test_data/iris")|>group_by(Species)|>summarize(max_petal=max(Petal.Length))## # A tibble: 3 x 2##   Species    max_petal##   <fct>          <dbl>## 1 setosa           1.9## 2 versicolor       5.1## 3 virginica        6.9

Becauseflight_get() returns an Arrow data structure,you can directly pipe its result into adplyr workflow. See the articleondata wrangling for moreinformation on working with Arrow objects via a dplyr interface.

Further reading


[8]ページ先頭

©2009-2025 Movatter.jp