Arrow Flight RPC#

Arrow Flight is an RPC framework for high-performance data servicesbased on Arrow data, and is built on top ofgRPC and theIPCformat.

Flight is organized around streams of Arrow record batches, beingeither downloaded from or uploaded to another service. A set ofmetadata methods offers discovery and introspection of streams, aswell as the ability to implement application-specific methods.

Methods and message wire formats are defined byProtobuf, enablinginteroperability with clients that may support gRPC and Arrowseparately, but not Flight. However, Flight implementations includefurther optimizations to avoid overhead in usage of Protobuf (mostlyaround avoiding excessive memory copies).

RPC Methods and Request Patterns#

Flight defines a set of RPC methods for uploading/downloading data,retrieving metadata about a data stream, listing available datastreams, and for implementing application-specific RPC methods. AFlight service implements some subset of these methods, while a Flightclient can call any of these methods.

Data streams are identified by descriptors (theFlightDescriptormessage), which are either a path or an arbitrary binary command. Forinstance, the descriptor may encode a SQL query, a path to a file on adistributed file system, or even a pickled Python object; theapplication can use this message as it sees fit.

Thus, one Flight client can connect to any service and perform basicoperations. To facilitate this, Flight services areexpected tosupport some common request patterns, described next. Of course,applications may ignore compatibility and simply treat the Flight RPCmethods as low-level building blocks for their own purposes.

SeeProtocol Buffer Definitions for full details on the methods andmessages involved.

Downloading Data#

A client that wishes to download the data would:

%% Licensed to the Apache Software Foundation (ASF) under one%% or more contributor license agreements. See the NOTICE file%% distributed with this work for additional information%% regarding copyright ownership. The ASF licenses this file%% to you under the Apache License, Version 2.0 (the%% "License"); you may not use this file except in compliance%% with the License. You may obtain a copy of the License at%%%% http://www.apache.org/licenses/LICENSE-2.0%%%% Unless required by applicable law or agreed to in writing,%% software distributed under the License is distributed on an%% "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY%% KIND, either express or implied. See the License for the%% specific language governing permissions and limitations%% under the License.sequenceDiagramautonumberparticipant Clientparticipant Metadata Serverparticipant Data ServerClient->>Metadata Server: GetFlightInfo(FlightDescriptor)Metadata Server->>Client: FlightInfo{endpoints: [FlightEndpoint{ticket: Ticket}, …]}Note over Client, Data Server: This may be parallelizedloop for each endpoint in FlightInfo.endpoints Client->>Data Server: DoGet(Ticket) Data Server->>Client: stream of FlightDataend

Retrieving data viaDoGet.#

  1. Construct or acquire aFlightDescriptor for the data set theyare interested in.

    A client may know what descriptor they want already, or they mayuse methods likeListFlights to discover them.

  2. CallGetFlightInfo(FlightDescriptor) to get aFlightInfomessage.

    Flight does not require that data live on the same server asmetadata. Hence,FlightInfo contains details on where the datais located, so the client can go fetch the data from an appropriateserver. This is encoded as a series ofFlightEndpoint messagesinsideFlightInfo. Each endpoint represents some location thatcontains a subset of the response data.

    An endpoint contains a list of locations (server addresses) wherethis data can be retrieved from, and aTicket, an opaque binarytoken that the server will use to identify the data beingrequested.

    IfFlightInfo.ordered is true, this signals there is some orderbetween data from different endpoints. Clients should produce thesame results as if the data returned from each of the endpoints wasconcatenated, in order, from front to back.

    IfFlightInfo.ordered is false, the client may return datafrom any of the endpoints in arbitrary order. Data from anyspecific endpoint must be returned in order, but the data fromdifferent endpoints may be interleaved to allow parallel fetches.

    Note that since some clients may ignoreFlightInfo.ordered, ifordering is important and client support cannot be ensured,servers should return a single endpoint.

    The response also contains other metadata, like the schema, andoptionally an estimate of the dataset size.

  3. Consume each endpoint returned by the server.

    To consume an endpoint, the client should connect to one of thelocations in the endpoint, then callDoGet(Ticket) with theticket in the endpoint. This will give the client a stream of Arrowrecord batches.

    If the server wishes to indicate that the data is on the localserver and not a different location, then it can return an emptylist of locations. The client can then reuse the existingconnection to the original server to fetch data. Otherwise, theclient must connect to one of the indicated locations.

    The server may list “itself” as a location alongside other serverlocations. Normally this requires the server to know its publicaddress, but it may also use the special URI stringarrow-flight-reuse-connection://? to tell clients that they mayreuse an existing connection to the same server, without having tobe able to name itself. SeeConnection Reuse below.

    In this way, the locations inside an endpoint can also be thoughtof as performing look-aside load balancing or service discoveryfunctions. And the endpoints can represent data that is partitionedor otherwise distributed.

    The client must consume all endpoints to retrieve the complete dataset. The client can consume endpoints in any order, or even inparallel, or distribute the endpoints among multiple machines forconsumption; this is up to the application to implement. The clientcan also useFlightInfo.ordered. See the previous item fordetails ofFlightInfo.ordered.

    Each endpoint may have expiration time(FlightEndpoint.expiration_time). If an endpoint has expirationtime, the client can get data multiple times byDoGet until theexpiration time is reached. Otherwise, it is application-definedwhetherDoGet requests may be retried. The expiration time isrepresented asgoogle.protobuf.Timestamp.

    If the expiration time is short, the client may be able to extendthe expiration time byRenewFlightEndpoint action. The clientneed to useDoAction withRenewFlightEndpoint action typeto extend the expiration time.Action.body must beRenewFlightEndpointRequest that hasFlightEndpoint to berenewed.

    The client may be able to cancel the returnedFlightInfo byCancelFlightInfo action. The client need to useDoActionwithCancelFlightInfo action type to cancel theFlightInfo.

Downloading Data by Running a Heavy Query#

A client may need to request a heavy query to downloaddata. However,GetFlightInfo doesn’t return until the querycompletes, so the client is blocked. In this situation, the clientcan usePollFlightInfo instead ofGetFlightInfo:

%% Licensed to the Apache Software Foundation (ASF) under one%% or more contributor license agreements. See the NOTICE file%% distributed with this work for additional information%% regarding copyright ownership. The ASF licenses this file%% to you under the Apache License, Version 2.0 (the%% "License"); you may not use this file except in compliance%% with the License. You may obtain a copy of the License at%%%% http://www.apache.org/licenses/LICENSE-2.0%%%% Unless required by applicable law or agreed to in writing,%% software distributed under the License is distributed on an%% "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY%% KIND, either express or implied. See the License for the%% specific language governing permissions and limitations%% under the License.sequenceDiagramautonumberparticipant Clientparticipant Metadata Serverparticipant Data ServerClient->>Metadata Server: PollFlightInfo(FlightDescriptor)Metadata Server->>Client: PollInfo{descriptor: FlightDescriptor', ...}Client->>Metadata Server: PollFlightInfo(FlightDescriptor')Metadata Server->>Client: PollInfo{descriptor: FlightDescriptor'', ...}Client->>Metadata Server: PollFlightInfo(FlightDescriptor'')Metadata Server->>Client: PollInfo{descriptor: null, info: FlightInfo{endpoints: [FlightEndpoint{ticket: Ticket}, …]}Note over Client, Data Server: This may be parallelizedNote over Client, Data Server: Some endpoints may be processed while pollingloop for each endpoint in FlightInfo.endpoints Client->>Data Server: DoGet(Ticket) Data Server->>Client: stream of FlightDataend

Polling a long-running query byPollFlightInfo.#

  1. Construct or acquire aFlightDescriptor, as before.

  2. CallPollFlightInfo(FlightDescriptor) to get aPollInfomessage.

    A server should respond as quickly as possible on the firstcall. So the client shouldn’t wait for the firstPollInforesponse.

    If the query isn’t finished,PollInfo.flight_descriptor has aFlightDescriptor. The client should use the descriptor (not theoriginalFlightDescriptor) to call the nextPollFlightInfo(). A server should recognize aPollInfo.flight_descriptor that is not necessarily the latestin case the client misses an update in between.

    If the query is finished,PollInfo.flight_descriptor isunset.

    PollInfo.info is the currently available results so far. It’sa completeFlightInfo each time not just the delta between theprevious and currentFlightInfo. A server should only append tothe endpoints inPollInfo.info each time. So the client canrunDoGet(Ticket) with theTicket in thePollInfo.infoeven when the query isn’t finished yet.FlightInfo.ordered isalso valid.

    A server should not respond until the result would be differentfrom last time. That way, the client can “long poll” for updateswithout constantly making requests. Clients can set a short timeoutto avoid blocking calls if desired.

    PollInfo.progress may be set. It represents progress of thequery. If it’s set, the value must be in[0.0,1.0]. The valueis not necessarily monotonic or nondecreasing. A server may respond byonly updating thePollInfo.progress value, though it shouldn’tspam the client with updates.

    PollInfo.timestamp is the expiration time for thisrequest. After this passes, a server might not accept the polldescriptor anymore and the query may be cancelled. This may beupdated on a call toPollFlightInfo. The expiration time isrepresented asgoogle.protobuf.Timestamp.

    A client may be able to cancel the query by theCancelFlightInfo action.

    A server should return an error status instead of a response if thequery fails. The client should not poll the request except forTIMED_OUT andUNAVAILABLE, which may not originate from theserver.

  3. Consume each endpoint returned by the server, as before.

Uploading Data#

To upload data, a client would:

%% Licensed to the Apache Software Foundation (ASF) under one%% or more contributor license agreements. See the NOTICE file%% distributed with this work for additional information%% regarding copyright ownership. The ASF licenses this file%% to you under the Apache License, Version 2.0 (the%% "License"); you may not use this file except in compliance%% with the License. You may obtain a copy of the License at%%%% http://www.apache.org/licenses/LICENSE-2.0%%%% Unless required by applicable law or agreed to in writing,%% software distributed under the License is distributed on an%% "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY%% KIND, either express or implied. See the License for the%% specific language governing permissions and limitations%% under the License.sequenceDiagramautonumberparticipant Clientparticipant ServerNote right of Client: The first FlightData includes a FlightDescriptorClient->>Server: DoPut(FlightData)Client->>Server: stream of FlightDataServer->>Client: PutResult{app_metadata}

Uploading data viaDoPut.#

  1. Construct or acquire aFlightDescriptor, as before.

  2. CallDoPut(FlightData) and upload a stream of Arrow recordbatches.

    TheFlightDescriptor is included with the first message so theserver can identify the dataset.

DoPut allows the server to send response messages back to theclient with custom metadata. This can be used to implement things likeresumable writes (e.g. the server can periodically send a messageindicating how many rows have been committed so far).

Exchanging Data#

Some use cases may require uploading and downloading data within asingle call. While this can be emulated with multiple calls, this maybe difficult if the application is stateful. For instance, theapplication may wish to implement a call where the client uploads dataand the server responds with a transformation of that data; this wouldrequire being stateful if implemented usingDoGet andDoPut. Instead,DoExchange allows this to be implemented as asingle call. A client would:

%% Licensed to the Apache Software Foundation (ASF) under one%% or more contributor license agreements. See the NOTICE file%% distributed with this work for additional information%% regarding copyright ownership. The ASF licenses this file%% to you under the Apache License, Version 2.0 (the%% "License"); you may not use this file except in compliance%% with the License. You may obtain a copy of the License at%%%% http://www.apache.org/licenses/LICENSE-2.0%%%% Unless required by applicable law or agreed to in writing,%% software distributed under the License is distributed on an%% "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY%% KIND, either express or implied. See the License for the%% specific language governing permissions and limitations%% under the License.sequenceDiagramautonumberparticipant Clientparticipant ServerNote right of Client: The first FlightData includes a FlightDescriptorClient->>Server: DoExchange(FlightData)par [Client sends data] Client->>Server: stream of FlightDataand [Server sends data] Server->>Client: stream of FlightDataend

Complex data flow withDoExchange.#

  1. Construct or acquire aFlightDescriptor, as before.

  2. CallDoExchange(FlightData).

    TheFlightDescriptor is included with the first message, aswithDoPut. At this point, both the client and the server maysimultaneously stream data to the other side.

Authentication#

Flight supports a variety of authentication methods that applicationscan customize for their needs.

“Handshake” authentication

This is implemented in two parts. At connection time, the clientcalls theHandshake RPC method, and the application-definedauthentication handler can exchange any number of messages with itscounterpart on the server. The handler then provides a binarytoken. The Flight client will then include this token in the headersof all future calls, which is validated by the server authenticationhandler.

Applications may use any part of this; for instance, they may ignorethe initial handshake and send an externally acquired token (e.g. abearer token) on each call, or they may establish trust during thehandshake and not validate a token for each call, treating theconnection as stateful (a “login” pattern).

Warning

Unless a token is validated on every call, this patternis not secure, especially in the presence of a layer7 load balancer, as is common with gRPC, or if gRPCtransparently reconnects the client.

Header-based/middleware-based authentication

Clients may include custom headers with calls. Custom middleware canthen be implemented to validate and accept/reject calls on theserver side.

Mutual TLS (mTLS)

The client provides a certificate during connection establishmentwhich is verified by the server. The application does not need toimplement any authentication code, but must provision and distributecertificates.

This may only be available in certain implementations, and is onlyavailable when TLS is also enabled.

Some Flight implementations may expose the underlying gRPC API aswell, in which case anyauthentication method supported by gRPC is available.

Location URIs#

Flight is primarily defined in terms of its Protobuf and gRPCspecification below, but Arrow implementations may also supportalternative transports (seeFlight RPC). Clients andservers need to know which transport to use for a given URI in aLocation, so Flight implementations should use the following URIschemes for the given transports:

Transport

URI Scheme

gRPC (plaintext)

grpc: or grpc+tcp:

gRPC (TLS)

grpc+tls:

gRPC (Unix domain socket)

grpc+unix:

(reuse connection)

arrow-flight-reuse-connection:

HTTP (1)

http: or https:

Notes:

  • (1) SeeExtended Location URIs for semantics when using

    http/https as the transport. It should be accessible via a GET request.

Connection Reuse#

“Reuse connection” above is not a particular transport. Instead, itmeans that the client may try to execute DoGet against the same server(and through the same connection) that it originally obtained theFlightInfo from (i.e., that it called GetFlightInfo against). This isinterpreted the same way as when no specificLocation arereturned.

This allows the server to return “itself” as one possible location tofetch data without having to know its own public address, which can beuseful in deployments where knowing this would be difficult orimpossible. For example, a developer may forward a remote service ina cloud environment to their local machine; in this case, the remoteservice would have no way to know the local hostname and port that itis being accessed over.

For compatibility reasons, the URI should always bearrow-flight-reuse-connection://?, with the trailing empty querystring. Java’s URI implementation does not acceptscheme: orscheme://, and C++’s implementation does not accept an emptystring, so the obvious candidates are not compatible. The chosenrepresentation can be parsed by both implementations, as well as Go’snet/url and Python’surllib.parse.

Extended Location URIs#

In addition to alternative transports, a server may also returnURIs that reference an external service or object storage location.This can be useful in cases where intermediate data is cached asApache Parquet files on cloud storage or is otherwise accessiblevia an HTTP service. In these scenarios, it is more efficient to beable to provide a URI where the client may simply download the datadirectly, rather than requiring a Flight service to read it back intomemory and serve it from aDoGet request.

To avoid the complexities of Flight clients having to implement supportfor multiple different cloud storage vendors (e.g. AWS S3, Google Cloud),we extend the URIs to only allow an HTTP/HTTPS URI where the client canperform a simple GET request to download the data. Authentication can behandled either by negotiating externally to the Flight protocol or by theserver sending a presigned URL that the client can make a GET request to.This should be supported by all current major cloud storage vendors, meaningonly the server needs to know the semantics of the underlying object store APIs.

When using an extended location URI, the client should ignore anyvalue in theTicket field of theFlightEndpoint. TheTicket is only used for identifying data in the context of aFlight service, and is not needed when the client is directlydownloading data from an external service.

Clients should assume that, unless otherwise specified, the data isbeing returned using theSerialization and Interprocess Communication (IPC) just as it wouldvia aDoGet call. If the returnedContent-Type header is a genericmedia type such asapplication/octet-stream, the client should still assumeit is an Arrow IPC stream. For other media types, such as Apache Parquet,the server should use the appropriate IANA Media Type that a clientwould recognize.

Finally, the server may also allow the client to choose what format thedata is returned in by respecting theAccept header in the request.If multiple formats are requested and supported, the choice of which touse is server-specific. If none of the requested content-types aresupported, the server may respond with either 406 (Not Acceptable),415 (Unsupported Media Type), or successfully respond with a differentformat that it does support, along with the correctContent-Typeheader.

Error Handling#

Arrow Flight defines its own set of error codes. The implementationdiffers between languages (e.g. in C++, Unimplemented is a generalArrow error status while it’s a Flight-specific exception in Java),but the following set is exposed:

Error Code

Description

UNKNOWN

An unknown error. The default if no othererror applies.

INTERNAL

An error internal to the serviceimplementation occurred.

INVALID_ARGUMENT

The client passed an invalid argument tothe RPC.

TIMED_OUT

The operation exceeded a timeout ordeadline.

NOT_FOUND

The requested resource (action, datastream) was not found.

ALREADY_EXISTS

The resource already exists.

CANCELLED

The operation was cancelled (either by theclient or the server).

UNAUTHENTICATED

The client is not authenticated.

UNAUTHORIZED

The client is authenticated, but does nothave permissions for the requestedoperation.

UNIMPLEMENTED

The RPC is not implemented.

UNAVAILABLE

The server is not available. May be emittedby the client for connectivity reasons.

External Resources#

Protocol Buffer Definitions#

  1/*  2 * Licensed to the Apache Software Foundation (ASF) under one  3 * or more contributor license agreements.  See the NOTICE file  4 * distributed with this work for additional information  5 * regarding copyright ownership.  The ASF licenses this file  6 * to you under the Apache License, Version 2.0 (the  7 * "License"); you may not use this file except in compliance  8 * with the License.  You may obtain a copy of the License at  9 * <p> 10 * http://www.apache.org/licenses/LICENSE-2.0 11 * <p> 12 * Unless required by applicable law or agreed to in writing, software 13 * distributed under the License is distributed on an "AS IS" BASIS, 14 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 15 * See the License for the specific language governing permissions and 16 * limitations under the License. 17 */ 18 19syntax="proto3"; 20import"google/protobuf/timestamp.proto"; 21 22optionjava_package="org.apache.arrow.flight.impl"; 23optiongo_package="github.com/apache/arrow-go/arrow/flight/gen/flight"; 24optioncsharp_namespace="Apache.Arrow.Flight.Protocol"; 25 26packagearrow.flight.protocol; 27 28/* 29 * A flight service is an endpoint for retrieving or storing Arrow data. A 30 * flight service can expose one or more predefined endpoints that can be 31 * accessed using the Arrow Flight Protocol. Additionally, a flight service 32 * can expose a set of actions that are available. 33 */ 34serviceFlightService{ 35 36/* 37   * Handshake between client and server. Depending on the server, the 38   * handshake may be required to determine the token that should be used for 39   * future operations. Both request and response are streams to allow multiple 40   * round-trips depending on auth mechanism. 41   */ 42rpcHandshake(streamHandshakeRequest)returns(streamHandshakeResponse){} 43 44/* 45   * Get a list of available streams given a particular criteria. Most flight 46   * services will expose one or more streams that are readily available for 47   * retrieval. This api allows listing the streams available for 48   * consumption. A user can also provide a criteria. The criteria can limit 49   * the subset of streams that can be listed via this interface. Each flight 50   * service allows its own definition of how to consume criteria. 51   */ 52rpcListFlights(Criteria)returns(streamFlightInfo){} 53 54/* 55   * For a given FlightDescriptor, get information about how the flight can be 56   * consumed. This is a useful interface if the consumer of the interface 57   * already can identify the specific flight to consume. This interface can 58   * also allow a consumer to generate a flight stream through a specified 59   * descriptor. For example, a flight descriptor might be something that 60   * includes a SQL statement or a Pickled Python operation that will be 61   * executed. In those cases, the descriptor will not be previously available 62   * within the list of available streams provided by ListFlights but will be 63   * available for consumption for the duration defined by the specific flight 64   * service. 65   */ 66rpcGetFlightInfo(FlightDescriptor)returns(FlightInfo){} 67 68/* 69   * For a given FlightDescriptor, start a query and get information 70   * to poll its execution status. This is a useful interface if the 71   * query may be a long-running query. The first PollFlightInfo call 72   * should return as quickly as possible. (GetFlightInfo doesn't 73   * return until the query is complete.) 74   * 75   * A client can consume any available results before 76   * the query is completed. See PollInfo.info for details. 77   * 78   * A client can poll the updated query status by calling 79   * PollFlightInfo() with PollInfo.flight_descriptor. A server 80   * should not respond until the result would be different from last 81   * time. That way, the client can "long poll" for updates 82   * without constantly making requests. Clients can set a short timeout 83   * to avoid blocking calls if desired. 84   * 85   * A client can't use PollInfo.flight_descriptor after 86   * PollInfo.expiration_time passes. A server might not accept the 87   * retry descriptor anymore and the query may be cancelled. 88   * 89   * A client may use the CancelFlightInfo action with 90   * PollInfo.info to cancel the running query. 91   */ 92rpcPollFlightInfo(FlightDescriptor)returns(PollInfo){} 93 94/* 95   * For a given FlightDescriptor, get the Schema as described in Schema.fbs::Schema 96   * This is used when a consumer needs the Schema of flight stream. Similar to 97   * GetFlightInfo this interface may generate a new flight that was not previously 98   * available in ListFlights. 99   */100rpcGetSchema(FlightDescriptor)returns(SchemaResult){}101102/*103   * Retrieve a single stream associated with a particular descriptor104   * associated with the referenced ticket. A Flight can be composed of one or105   * more streams where each stream can be retrieved using a separate opaque106   * ticket that the flight service uses for managing a collection of streams.107   */108rpcDoGet(Ticket)returns(streamFlightData){}109110/*111   * Push a stream to the flight service associated with a particular112   * flight stream. This allows a client of a flight service to upload a stream113   * of data. Depending on the particular flight service, a client consumer114   * could be allowed to upload a single stream per descriptor or an unlimited115   * number. In the latter, the service might implement a 'seal' action that116   * can be applied to a descriptor once all streams are uploaded.117   */118rpcDoPut(streamFlightData)returns(streamPutResult){}119120/*121   * Open a bidirectional data channel for a given descriptor. This122   * allows clients to send and receive arbitrary Arrow data and123   * application-specific metadata in a single logical stream. In124   * contrast to DoGet/DoPut, this is more suited for clients125   * offloading computation (rather than storage) to a Flight service.126   */127rpcDoExchange(streamFlightData)returns(streamFlightData){}128129/*130   * Flight services can support an arbitrary number of simple actions in131   * addition to the possible ListFlights, GetFlightInfo, DoGet, DoPut132   * operations that are potentially available. DoAction allows a flight client133   * to do a specific action against a flight service. An action includes134   * opaque request and response objects that are specific to the type action135   * being undertaken.136   */137rpcDoAction(Action)returns(streamResult){}138139/*140   * A flight service exposes all of the available action types that it has141   * along with descriptions. This allows different flight consumers to142   * understand the capabilities of the flight service.143   */144rpcListActions(Empty)returns(streamActionType){}145}146147/*148 * The request that a client provides to a server on handshake.149 */150messageHandshakeRequest{151152/*153   * A defined protocol version154   */155uint64protocol_version=1;156157/*158   * Arbitrary auth/handshake info.159   */160bytespayload=2;161}162163messageHandshakeResponse{164165/*166   * A defined protocol version167   */168uint64protocol_version=1;169170/*171   * Arbitrary auth/handshake info.172   */173bytespayload=2;174}175176/*177 * A message for doing simple auth.178 */179messageBasicAuth{180stringusername=2;181stringpassword=3;182}183184messageEmpty{}185186/*187 * Describes an available action, including both the name used for execution188 * along with a short description of the purpose of the action.189 */190messageActionType{191stringtype=1;192stringdescription=2;193}194195/*196 * A service specific expression that can be used to return a limited set197 * of available Arrow Flight streams.198 */199messageCriteria{200bytesexpression=1;201}202203/*204 * An opaque action specific for the service.205 */206messageAction{207stringtype=1;208bytesbody=2;209}210211/*212 * An opaque result returned after executing an action.213 */214messageResult{215bytesbody=1;216}217218/*219 * Wrap the result of a getSchema call220 */221messageSchemaResult{222// The schema of the dataset in its IPC form:223//   4 bytes - an optional IPC_CONTINUATION_TOKEN prefix224//   4 bytes - the byte length of the payload225//   a flatbuffer Message whose header is the Schema226bytesschema=1;227}228229/*230 * The name or tag for a Flight. May be used as a way to retrieve or generate231 * a flight or be used to expose a set of previously defined flights.232 */233messageFlightDescriptor{234235/*236   * Describes what type of descriptor is defined.237   */238enumDescriptorType{239240// Protobuf pattern, not used.241UNKNOWN=0;242243/*244     * A named path that identifies a dataset. A path is composed of a string245     * or list of strings describing a particular dataset. This is conceptually246     *  similar to a path inside a filesystem.247     */248PATH=1;249250/*251     * An opaque command to generate a dataset.252     */253CMD=2;254}255256DescriptorTypetype=1;257258/*259   * Opaque value used to express a command. Should only be defined when260   * type = CMD.261   */262bytescmd=2;263264/*265   * List of strings identifying a particular dataset. Should only be defined266   * when type = PATH.267   */268repeatedstringpath=3;269}270271/*272 * The access coordinates for retrieval of a dataset. With a FlightInfo, a273 * consumer is able to determine how to retrieve a dataset.274 */275messageFlightInfo{276// The schema of the dataset in its IPC form:277//   4 bytes - an optional IPC_CONTINUATION_TOKEN prefix278//   4 bytes - the byte length of the payload279//   a flatbuffer Message whose header is the Schema280bytesschema=1;281282/*283   * The descriptor associated with this info.284   */285FlightDescriptorflight_descriptor=2;286287/*288   * A list of endpoints associated with the flight. To consume the289   * whole flight, all endpoints (and hence all Tickets) must be290   * consumed. Endpoints can be consumed in any order.291   *292   * In other words, an application can use multiple endpoints to293   * represent partitioned data.294   *295   * If the returned data has an ordering, an application can use296   * "FlightInfo.ordered = true" or should return the all data in a297   * single endpoint. Otherwise, there is no ordering defined on298   * endpoints or the data within.299   *300   * A client can read ordered data by reading data from returned301   * endpoints, in order, from front to back.302   *303   * Note that a client may ignore "FlightInfo.ordered = true". If an304   * ordering is important for an application, an application must305   * choose one of them:306   *307   * * An application requires that all clients must read data in308   *   returned endpoints order.309   * * An application must return the all data in a single endpoint.310   */311repeatedFlightEndpointendpoint=3;312313// Set these to -1 if unknown.314int64total_records=4;315int64total_bytes=5;316317/*318   * FlightEndpoints are in the same order as the data.319   */320boolordered=6;321322/*323   * Application-defined metadata.324   *325   * There is no inherent or required relationship between this326   * and the app_metadata fields in the FlightEndpoints or resulting327   * FlightData messages. Since this metadata is application-defined,328   * a given application could define there to be a relationship,329   * but there is none required by the spec.330   */331bytesapp_metadata=7;332}333334/*335 * The information to process a long-running query.336 */337messagePollInfo{338/*339   * The currently available results.340   *341   * If "flight_descriptor" is not specified, the query is complete342   * and "info" specifies all results. Otherwise, "info" contains343   * partial query results.344   *345   * Note that each PollInfo response contains a complete346   * FlightInfo (not just the delta between the previous and current347   * FlightInfo).348   *349   * Subsequent PollInfo responses may only append new endpoints to350   * info.351   *352   * Clients can begin fetching results via DoGet(Ticket) with the353   * ticket in the info before the query is354   * completed. FlightInfo.ordered is also valid.355   */356FlightInfoinfo=1;357358/*359   * The descriptor the client should use on the next try.360   * If unset, the query is complete.361   */362FlightDescriptorflight_descriptor=2;363364/*365   * Query progress. If known, must be in [0.0, 1.0] but need not be366   * monotonic or nondecreasing. If unknown, do not set.367   */368optionaldoubleprogress=3;369370/*371   * Expiration time for this request. After this passes, the server372   * might not accept the retry descriptor anymore (and the query may373   * be cancelled). This may be updated on a call to PollFlightInfo.374   */375google.protobuf.Timestampexpiration_time=4;376}377378/*379 * The request of the CancelFlightInfo action.380 *381 * The request should be stored in Action.body.382 */383messageCancelFlightInfoRequest{384FlightInfoinfo=1;385}386387/*388 * The result of a cancel operation.389 *390 * This is used by CancelFlightInfoResult.status.391 */392enumCancelStatus{393// The cancellation status is unknown. Servers should avoid using394// this value (send a NOT_FOUND error if the requested query is395// not known). Clients can retry the request.396CANCEL_STATUS_UNSPECIFIED=0;397// The cancellation request is complete. Subsequent requests with398// the same payload may return CANCELLED or a NOT_FOUND error.399CANCEL_STATUS_CANCELLED=1;400// The cancellation request is in progress. The client may retry401// the cancellation request.402CANCEL_STATUS_CANCELLING=2;403// The query is not cancellable. The client should not retry the404// cancellation request.405CANCEL_STATUS_NOT_CANCELLABLE=3;406}407408/*409 * The result of the CancelFlightInfo action.410 *411 * The result should be stored in Result.body.412 */413messageCancelFlightInfoResult{414CancelStatusstatus=1;415}416417/*418 * An opaque identifier that the service can use to retrieve a particular419 * portion of a stream.420 *421 * Tickets are meant to be single use. It is an error/application-defined422 * behavior to reuse a ticket.423 */424messageTicket{425bytesticket=1;426}427428/*429 * A location to retrieve a particular stream from. This URI should be one of430 * the following:431 *  - An empty string or the string 'arrow-flight-reuse-connection://?':432 *    indicating that the ticket can be redeemed on the service where the433 *    ticket was generated via a DoGet request.434 *  - A valid grpc URI (grpc://, grpc+tls://, grpc+unix://, etc.):435 *    indicating that the ticket can be redeemed on the service at the given436 *    URI via a DoGet request.437 *  - A valid HTTP URI (http://, https://, etc.):438 *    indicating that the client should perform a GET request against the439 *    given URI to retrieve the stream. The ticket should be empty440 *    in this case and should be ignored by the client. Cloud object storage441 *    can be utilized by presigned URLs or mediating the auth separately and442 *    returning the full URL (e.g. https://amzn-s3-demo-bucket.s3.us-west-2.amazonaws.com/...).443 *444 * We allow non-Flight URIs for the purpose of allowing Flight services to indicate that445 * results can be downloaded in formats other than Arrow (such as Parquet) or to allow446 * direct fetching of results from a URI to reduce excess copying and data movement.447 * In these cases, the following conventions should be followed by servers and clients:448 *449 *  - Unless otherwise specified by the 'Content-Type' header of the response,450 *    a client should assume the response is using the Arrow IPC Streaming format.451 *    Usage of an IANA media type like 'application/octet-stream' should be assumed to452 *    be using the Arrow IPC Streaming format.453 *  - The server may allow the client to choose a specific response format by454 *    specifying an 'Accept' header in the request, such as 'application/vnd.apache.parquet'455 *    or 'application/vnd.apache.arrow.stream'. If multiple types are requested and456 *    supported by the server, the choice of which to use is server-specific. If457 *    none of the requested content-types are supported, the server may respond with458 *    either 406 (Not Acceptable) or 415 (Unsupported Media Type), or successfully459 *    respond with a different format that it does support along with the correct460 *    'Content-Type' header.461 *462 * Note: new schemes may be proposed in the future to allow for more flexibility based463 * on community requests.464 */465messageLocation{466stringuri=1;467}468469/*470 * A particular stream or split associated with a flight.471 */472messageFlightEndpoint{473474/*475   * Token used to retrieve this stream.476   */477Ticketticket=1;478479/*480   * A list of URIs where this ticket can be redeemed via DoGet().481   *482   * If the list is empty, the expectation is that the ticket can only483   * be redeemed on the current service where the ticket was484   * generated.485   *486   * If the list is not empty, the expectation is that the ticket can be487   * redeemed at any of the locations, and that the data returned will be488   * equivalent. In this case, the ticket may only be redeemed at one of the489   * given locations, and not (necessarily) on the current service. If one490   * of the given locations is "arrow-flight-reuse-connection://?", the491   * client may redeem the ticket on the service where the ticket was492   * generated (i.e., the same as above), in addition to the other493   * locations. (This URI was chosen to maximize compatibility, as 'scheme:'494   * or 'scheme://' are not accepted by Java's java.net.URI.)495   *496   * In other words, an application can use multiple locations to497   * represent redundant and/or load balanced services.498   */499repeatedLocationlocation=2;500501/*502   * Expiration time of this stream. If present, clients may assume503   * they can retry DoGet requests. Otherwise, it is504   * application-defined whether DoGet requests may be retried.505   */506google.protobuf.Timestampexpiration_time=3;507508/*509   * Application-defined metadata.510   *511   * There is no inherent or required relationship between this512   * and the app_metadata fields in the FlightInfo or resulting513   * FlightData messages. Since this metadata is application-defined,514   * a given application could define there to be a relationship,515   * but there is none required by the spec.516   */517bytesapp_metadata=4;518}519520/*521 * The request of the RenewFlightEndpoint action.522 *523 * The request should be stored in Action.body.524 */525messageRenewFlightEndpointRequest{526FlightEndpointendpoint=1;527}528529/*530 * A batch of Arrow data as part of a stream of batches.531 */532messageFlightData{533534/*535   * The descriptor of the data. This is only relevant when a client is536   * starting a new DoPut stream.537   */538FlightDescriptorflight_descriptor=1;539540/*541   * Header for message data as described in Message.fbs::Message.542   */543bytesdata_header=2;544545/*546   * Application-defined metadata.547   */548bytesapp_metadata=3;549550/*551   * The actual batch of Arrow data. Preferably handled with minimal-copies552   * coming last in the definition to help with sidecar patterns (it is553   * expected that some implementations will fetch this field off the wire554   * with specialized code to avoid extra memory copies).555   */556bytesdata_body=1000;557}558559/**560 * The response message associated with the submission of a DoPut.561 */562messagePutResult{563bytesapp_metadata=1;564}565566/*567 * EXPERIMENTAL: Union of possible value types for a Session Option to be set to.568 *569 * By convention, an attempt to set a valueless SessionOptionValue should570 * attempt to unset or clear the named option value on the server.571 */572messageSessionOptionValue{573messageStringListValue{574repeatedstringvalues=1;575}576577oneofoption_value{578stringstring_value=1;579boolbool_value=2;580sfixed64int64_value=3;581doubledouble_value=4;582StringListValuestring_list_value=5;583}584}585586/*587 * EXPERIMENTAL: A request to set session options for an existing or new (implicit)588 * server session.589 *590 * Sessions are persisted and referenced via a transport-level state management, typically591 * RFC 6265 HTTP cookies when using an HTTP transport.  The suggested cookie name or state592 * context key is 'arrow_flight_session_id', although implementations may freely choose their593 * own name.594 *595 * Session creation (if one does not already exist) is implied by this RPC request, however596 * server implementations may choose to initiate a session that also contains client-provided597 * session options at any other time, e.g. on authentication, or when any other call is made598 * and the server wishes to use a session to persist any state (or lack thereof).599 */600messageSetSessionOptionsRequest{601map<string,SessionOptionValue>session_options=1;602}603604/*605 * EXPERIMENTAL: The results (individually) of setting a set of session options.606 *607 * Option names should only be present in the response if they were not successfully608 * set on the server; that is, a response without an Error for a name provided in the609 * SetSessionOptionsRequest implies that the named option value was set successfully.610 */611messageSetSessionOptionsResult{612enumErrorValue{613// Protobuf deserialization fallback value: The status is unknown or unrecognized.614// Servers should avoid using this value. The request may be retried by the client.615UNSPECIFIED=0;616// The given session option name is invalid.617INVALID_NAME=1;618// The session option value or type is invalid.619INVALID_VALUE=2;620// The session option cannot be set.621ERROR=3;622}623624messageError{625ErrorValuevalue=1;626}627628map<string,Error>errors=1;629}630631/*632 * EXPERIMENTAL: A request to access the session options for the current server session.633 *634 * The existing session is referenced via a cookie header or similar (see635 * SetSessionOptionsRequest above); it is an error to make this request with a missing,636 * invalid, or expired session cookie header or other implementation-defined session637 * reference token.638 */639messageGetSessionOptionsRequest{640}641642/*643 * EXPERIMENTAL: The result containing the current server session options.644 */645messageGetSessionOptionsResult{646map<string,SessionOptionValue>session_options=1;647}648649/*650 * Request message for the "Close Session" action.651 *652 * The exiting session is referenced via a cookie header.653 */654messageCloseSessionRequest{655}656657/*658 * The result of closing a session.659 */660messageCloseSessionResult{661enumStatus{662// Protobuf deserialization fallback value: The session close status is unknown or663// not recognized. Servers should avoid using this value (send a NOT_FOUND error if664// the requested session is not known or expired). Clients can retry the request.665UNSPECIFIED=0;666// The session close request is complete. Subsequent requests with667// the same session produce a NOT_FOUND error.668CLOSED=1;669// The session close request is in progress. The client may retry670// the close request.671CLOSING=2;672// The session is not closeable. The client should not retry the673// close request.674NOT_CLOSEABLE=3;675}676677Statusstatus=1;678}