Use the BigQuery Storage Read API to read table data
The BigQuery Storage Read API provides fast access toBigQuery-managed storage by using anrpc-basedprotocol.
Background
Historically, users of BigQuery have had two mechanisms foraccessing BigQuery-managed table data:
Record-based paginated access by using the
tabledata.listorjobs.getQueryResultsREST API methods. The BigQuery APIprovides structured row responses in a paginated fashion appropriate for smallresult sets.Bulk data export using BigQuery
extractjobs that export tabledata to Cloud Storage in a variety of file formats such as CSV, JSON,and Avro. Table exports are limited by daily quotas and by the batchnature of the export process.
The BigQuery Storage Read API provides a third option that represents animprovement over prior options. When you use the Storage Read API,structured data is sent over the wire in a binary serialization format. Thisallows for additional parallelism among multiple consumers for a set of results.
The Storage Read API does not provide functionality related tomanaging BigQuery resources such as datasets, jobs, or tables.
Key features
Multiple Streams: The Storage Read API allows consumers toread disjoint sets of rows from a table using multiple streams within asession. This facilitates consumption from distributed processing frameworksor from independent consumer threads within a single client.
Column Projection: At session creation, users can select an optionalsubset of columns to read. This allows efficient reads when tables containmany columns.
Column Filtering: Users may provide simple filter predicates to enablefiltration of data on the server side before transmission to a client.
Snapshot Consistency: Storage sessions read based on a snapshotisolation model. All consumers read based on a specific point in time.The default snapshot time is based on the session creation time, but consumersmay read data from an earlier snapshot.
Enabling the API
The Storage Read API is distinct from the BigQuery API, andshows up separately in the Google Cloud console as theBigQuery Storage API.However, the Storage Read API is enabled in all projects in whichthe BigQuery API is enabled; no additional activation steps are required.
Permissions
To get the permissions that you need to create and update read sessions, ask your administrator to grant you the Read Session User (bigquery.readSessionUser) IAM role on the project. For more information about granting roles, seeManage access to projects, folders, and organizations.
This predefined role contains the permissions required to create and update read sessions. To see the exact permissions that are required, expand theRequired permissions section:
Required permissions
The following permissions are required to create and update read sessions:
bigquery.readsessions.createon the projectbigquery.readsessions.getDataon the table or higherbigquery.readsessions.updateon the table or higher
You might also be able to get these permissions withcustom roles or otherpredefined roles.
For more information about BigQuery roles and permissions, seeBigQuery IAM roles and permissions.
Basic API flow
This section describes the basic flow of using the Storage Read API. Forexamples, see thelibraries and samples page.
Create a session
Storage Read API usage begins with the creation of a read session. Themaximum number of streams, the snapshot time, the set of columns to return, andthe predicate filter are all specified as part of theReadSession messagesupplied to theCreateReadSession RPC.
TheReadSession response contains a set ofStream identifiers. When a readsession is created, the server determines the amount of data that can be read inthe context of the session and creates one or more streams, each of whichrepresents approximately the same amount of table data to be scanned. This meansthat, to read all the data from a table, callers must read from allStreamidentifiers returned in theReadSession response. This is a change fromearlier versions of the API, in which no limit existed on the amount of datathat could be read in a single stream context.
TheReadSession response contains a reference schema for the session and alist of availableStream identifiers. Sessions expire automatically and do notrequire any cleanup or finalization. The expiration time is returned as part oftheReadSession response and is guaranteed to be at least 6 hours from sessioncreation time.
Read from a session stream
Data from a given stream is retrieved by invoking theReadRows streaming RPC.Once the read request for aStream is initiated, the backend will begintransmitting blocks of serialized row data. RPC flow control ensures thatthe server does not transmit more data when the client is not ready to receive.If the client does not request data for more than 1 hour, then the serversuspects that the stream is stalled and closes it to free up resources for otherstreams. If there is an error, you can restart reading a stream at a particular point by supplying the row offset when you callReadRows.
To support dynamic work rebalancing, the Storage Read API provides anadditional method to split aStream into two childStream instances whosecontents are, together, equal to the contents of the parentStream. For moreinformation, see theAPI reference.
Decode row blocks
Row blocks must be deserialized once they are received. Users ofthe Storage Read API may specify all data in a session to be serializedusing either Apache Avro format, or Apache Arrow.
The reference schema is sent as part of the initialReadSession response,appropriate for the data format selected. In most cases, decoders can belong-lived because the schema and serialization are consistent among all streamsand row blocks in a session.
Schema conversion
Avro schema details
Due to type system differences between BigQuery and the Avrospecification, Avro schemas may include additional annotations that identify howto map the Avro types to BigQuery representations. When compatible,Avro base types and logical types are used. The Avro schema may also includeadditional annotations for types present in BigQuery that do nothave a well defined Avro representation.
To represent nullable columns, unions with the AvroNULL type are used.
| GoogleSQL type | Avro type | Avro schema annotations | Notes |
|---|---|---|---|
BOOLEAN | boolean | ||
INT64 | long | ||
FLOAT64 | double | ||
BYTES | bytes | ||
STRING | string | ||
DATE | int | logicalType: date | |
DATETIME | string | logicalType: datetime | |
TIMESTAMP | long | logicalType: timestamp-micros | |
TIME | long | logicalType: time-micros | |
NUMERIC | bytes | logicalType: decimal (precision = 38, scale = 9) | |
NUMERIC(P[, S]) | bytes | logicalType: decimal (precision = P, scale = S) | |
BIGNUMERIC | bytes | logicalType: decimal (precision = 77, scale = 38) | |
BIGNUMERIC(P[, S]) | bytes | logicalType: decimal (precision = P, scale = S) | |
GEOGRAPHY | string | sqlType: GEOGRAPHY | |
ARRAY | array | ||
STRUCT | record | ||
JSON | string | sqlType: JSON | |
RANGE<T> | record | sqlType: RANGE | Contains the following fields:
The first |
Arrow schema details
The Apache Arrow format works well with Python data science workloads.
For cases where multiple BigQuery types converge on a singleArrow data type, the metadata property of the Arrow schema field indicatesthe original data type.
If you're working in an older version of the Storage Read API, thenuse the appropriate version of Arrow as follows:
- v1beta1: Arrow 0.14 and earlier
- v1: Arrow 0.15 and later
Regardless of API version, to access API functions, we recommend that you usetheBigQuery Storage API client libraries. The libraries canbe used with any version of Arrow and don't obstruct its updates.
| GoogleSQL type | Arrow logical type | Notes |
|---|---|---|
BOOLEAN | Boolean | |
INT64 | Int64 | |
FLOAT64 | Double | |
BYTES | Binary | |
STRING | Utf8 | |
DATE | Date | 32-bit days since epoch |
DATETIME | Timestamp | Microsecond precision, no timezone |
TIMESTAMP | Timestamp | Microsecond precision, UTC timezone |
TIME | Time | Microsecond precision |
NUMERIC | Decimal | Precision = 38, scale = 9 |
NUMERIC(P[, S]) | Decimal | Precision = P, scale = S |
BIGNUMERIC | Decimal256 | Precision = 76, scale = 38 |
BIGNUMERIC(P[, S]) | Decimal256 | Precision = P, scale = S |
GEOGRAPHY | Utf8 | |
ARRAY | List | |
STRUCT | Struct | |
JSON | Utf8 | |
RANGE<T> | Struct | Contains the following fields:
ARROW_TYPE(T) is the Arrow type representation of the range element typeT. A null field denotes an unbounded range boundary. For example,RANGE<DATE> is represented as a struct with two ArrowDate fields. |
Limitations
Because the Storage Read API operates on storage,you cannot use the Storage Read API to directly read from logical ormaterialized views. As a workaround, you can execute a BigQueryquery over the view and use the Storage Read API to read from theresulting table. Some connectors, including theSpark-BigQuery connector,support this workflow natively.
Readingexternal tables is not supported. Touse the Storage Read API with external data sources, useBigLake tables.
Supported regions
The Storage Read API is supported in the same regions asBigQuery. See theDataset locations page for acomplete list of supported regions and multi-regions.
Data locality
Data locality is the process of moving the computation closer to the locationwhere the data resides. Data locality impacts both the peak throughput andconsistency of performance.
BigQuerydetermines the location to run your load, query, or extract jobs based on thedatasets referenced in the request. For information about location considerations,seeBigQuery locations.
Troubleshoot errors
The following are common errors encountered when using theStorage Read API:
- Error:
Stream removed - Resolution: Retry the Storage Read API request. This is likelya transient error that can be resolved by retrying the request. If the problempersists,contact support.
- Error:
Stream expired Cause: This error occurs when the Storage Read API sessionreaches the6 hour timeout.
Resolution:
- Increase the parallelism of the job.
- If the CPU utilization of the worker nodes is relatively consistent anddoesn't spike above 85%, consider running the job on a larger machine type.
- Split the job into multiple jobs or smaller queries.
Quotas and limits
For Storage Read API quotas and limits, seeStorage Read API limits.
Monitor Storage Read API use
To monitor the data egress and processing associated with theStorage Read API, specific fields are available in theBigQuery AuditLogs.These logs provide a detailed view of the bytes scanned and the bytes returnedto the client.
The relevant API method for these logs isgoogle.cloud.bigquery.storage.v1.BigQueryRead.ReadRows.
| Field name | Data type | Notes |
|---|---|---|
serialized_response_bytes | INT64 | The total number of bytes sent to the client over the network, after serialization. This field helps you track data egress. |
scanned_bytes | INT64 | The total number of bytes scanned from BigQuery storage to fulfill the request. This value is used to calculate the analysis cost of the read operation. |
Pricing
For information on Storage Read API pricing, see thePricing page.
Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2026-02-18 UTC.