Synchronize online and offline datasets withBigQuery DataFrames

Preview

This product or feature is subject to the "Pre-GA Offerings Terms" in the General Service Terms section of the Service Specific Terms. Pre-GA products and features are available "as is" and might have limited support. For more information, see thelaunch stage descriptions.

Using Bigtable with BigQuery, you can build areal-time analytics database and use it in machine learning (ML) pipelines.This lets you keep your data in sync, supporting data manipulation and modeldevelopment (offline access) and low-latency application serving (onlineaccess).

To build your real-time analytics database, you can useBigQuery DataFrames,a set of open-source Python libraries for BigQuery dataprocessing. BigQuery DataFrames lets you develop and train models inBigQuery and automatically replicate a copy of the latest datavalues used for your ML models in Bigtable for online serving.

This document provides an overview of using thebigframes.streaming API tocreate BigQuery jobs that automatically replicate and synchronizedatasets across BigQuery and Bigtable. Before youread this document, make sure that you understand the following documents:

BigQuery DataFrames

BigQuery DataFrames helps you develop andtrain models in BigQuery and automatically replicate a copy ofthe latest data values used for your ML models in Bigtable foronline serving. It lets you do the following:

Develop data transformations in a Pandas-compatible interface(bigframes.pandas) directly against BigQuery data
Train models using a scikit-learn-like API (bigframes.ML)
Synchronize the data needed for low-latency inference withBigtable (bigframes.streaming) to support user-facingapplications

Note: If you want to batch export from BigQuery toBigtable using SQL, you can set up a reverseextract-load-transform (ETL). For more information, see Export data toBigtable (Reverse ETL).

BigFrames StreamingDataFrame

bigframes.streaming.StreamingDataFrame is a DataFrame type in theBigQuery DataFrames package. It lets you create aStreamingDataFrame object that can be used to generate a continuously runningjob that streams data from a designated BigQuery table intoBigtable for online serving. This is done by generatingBigQuerycontinuous queries.

ABigFrames StreamingDataFrame can do the following:

Create aStreamingDataFrame from a designated BigQuery table
Optionally, perform additional Pandas operations like select, filter, andpreview the content
Create and manage streaming jobs to Bigtable

Required roles

To get the permissions that you need to use BigQuery DataFramesin a BigQuery notebook, ask your administrator to grant you thefollowing IAM roles:

To get the permissions that you need to write data to a Bigtabletable, ask your administrator to grant you the following IAMroles:

Bigtable User (roles/bigtable.user)

Get started

BigQuery DataFrames is an open-source package. To install thelatest version, runpip install --upgrade bigframes.

To create your firstBigFrames StreamingDataFrame and synchronize data betweenBigQuery and Bigtable, run following code snippet.For the complete code sample, see the GitHub notebookBigFramesStreamingDataFrame.

importbigframes.streamingasbstbigframes.options._bigquery_options.project="PROJECT"sdf=bst.read_gbq_table("birds.penguins_bigtable_streaming")job=sdf.to_bigtable(instance="BIGTABLE_INSTANCE",table="TABLE",app_profile=None,truncate=True,overwrite=True,`auto_create_column_families=True,bigtable_options={},job_id=None,job_id_prefix="test_streaming_",)print(job.running())print(job.error_result)

Replace the following:

PROJECT: the ID of your Google Cloud project
BIGTABLE_INSTANCE: the ID of the Bigtable instancethat contains the table you are writing to
TABLE: the ID of the Bigtable table that you arewriting to

Once the job is initialized, it runs as a continuous query inBigQuery and streams any data changes to Bigtable.

Costs

There are no additional charges for using the BigQuery BigFramesAPI, but you are charged for the underlying resources used forcontinuous queries, Bigtable, and BigQuery.

Continuous queries useBigQuery capacity computepricing,which is measured inslots. Torun continuous queries, you must have areservationthat uses theEnterprise or Enterprise Plusedition and areservationassignmentthat uses theCONTINUOUS job type.

Usage of other BigQuery resources, such as data ingestion andstorage, are charged at the rates shown inBigQuerypricing.

Usage of Bigtable services that receive continuous queryresults are charged at theBigtablepricing rates.

Limitations

Allfeatureandlocationlimitations associated with continuous queries are also applicable forstreaming DataFrames.

What's next

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2025-12-15 UTC.

Movatterモバイル変換