Introduction to BigQuery DataFrames

BigQuery DataFrames is a set of open source Python libraries that letyou take advantage of BigQuery data processing by using familiarPython APIs. BigQuery DataFrames provides a Pythonic DataFrame poweredby the BigQuery engine, and it implements the pandas andscikit-learn APIs by pushing the processing down to BigQuerythrough SQL conversion. This lets you use BigQuery to exploreand process terabytes of data, and also train machine learning (ML) models,all with Python APIs.

The following diagram describes the workflow of BigQuery DataFrames:

BigQuery DataFrames workflow

Note: There are breaking changes to some default parameters inBigQuery DataFrames version 2.0. To learn about these changes and how tomigrate to version 2.0, seeMigrate to BigQuery DataFrames2.0.

BigQuery DataFrames benefits

BigQuery DataFrames does the following:

  • Offers more than 750 pandas and scikit-learn APIs implemented throughtransparent SQL conversion to BigQuery andBigQuery ML APIs.
  • Defers the execution of queries for enhanced performance.
  • Extends data transformations with user-defined Python functions to letyou process data in Google Cloud. These functions areautomatically deployed as BigQueryremote functions.
  • Integrates with Vertex AI to let you use Gemini modelsfor text generation.

Licensing

BigQuery DataFrames is distributed with theApache-2.0 license.

BigQuery DataFrames also contains code derived from the followingthird-party packages:

For details, see thethird_party/bigframes_vendoreddirectory in the BigQuery DataFrames GitHub repository.

Quotas and limits

  • BigQuery quotas apply toBigQuery DataFrames, including hardware, software, and networkcomponents.
  • A subset of pandas and scikit-learn APIs are supported. For moreinformation, seeSupported pandas APIs.
  • You must explicitly clean up any automatically created Cloud Run functionsfunctions as part of session cleanup. For more information, seeSupported pandas APIs.

Pricing

  • BigQuery DataFrames is a set of open source Python libraries available for download at no extra cost.
  • BigQuery DataFrames uses BigQuery,Cloud Run functions, Vertex AI, and otherGoogle Cloud services, which incur their own costs.
  • During regular usage, BigQuery DataFrames stores temporary data,such as intermediate results, in BigQuery tables. Thesetables persist for seven days by default, and you are charged for the datastored in them. The tables are created in the_anonymous_ datasetin the Google Cloud project you specify in thebf.options.bigquery.project option.

What's next

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2025-12-15 UTC.