- Notifications
You must be signed in to change notification settings - Fork63
BigQuery DataFrames (also known as BigFrames)
License
googleapis/python-bigquery-dataframes
Folders and files
| Name | Name | Last commit message | Last commit date | |
|---|---|---|---|---|
Repository files navigation
| orphan: |
|---|
BigQuery DataFrames (also known as BigFrames) provides a Pythonic DataFrameand machine learning (ML) API powered by the BigQuery engine.
- bigframes.pandas provides a pandas API for analytics. Many workloads can bemigrated from pandas to bigframes by just changing a few imports.
bigframes.mlprovides a scikit-learn-like API for ML.
BigQuery DataFrames is an open-source package.
Version 2.0 introduces breaking changes for improved security and performance. See below for details.
The easiest way to get started is to try theBigFrames quickstartin anotebook in BigQuery Studio.
To use BigFrames in your local development environment,
- Run
pip install --upgrade bigframesto install the latest version. - SetupApplication default credentialsfor your local development environment enviroment.
- Create aGCP project with the BigQuery API enabled.
- Use the
bigframespackage to query data.
importbigframes.pandasasbpdbpd.options.bigquery.project=your_gcp_project_iddf=bpd.read_gbq("bigquery-public-data.usa_names.usa_1910_2013")print(df.groupby("name") .agg({"number":"sum"}) .sort_values("number",ascending=False) .head(10) .to_pandas())
To learn more about BigQuery DataFrames, visit these pages
Version 2.0 introduces breaking changes for improved security and performance. Key default behaviors have changed, including
- Large Results (>10GB): The default value for
allow_large_resultshas changed toFalse.Methods liketo_pandas()will now fail if the query result's compressed data size exceeds 10GB,unless large results are explicitly permitted. - Remote Function Security: The library no longer automatically lets the Compute Engine default serviceaccount become the identity of the Cloud Run functions. If that is desired, it has to be indicated by passing
cloud_function_service_account="default". And network ingress now defaults to"internal-only". - @remote_function Argument Passing: Arguments other than
input_types,output_type, anddatasettoremote_functionmust now be passed using keyword syntax, as positional arguments are no longer supported. - @udf Argument Passing: Arguments
datasetandnametoudfare now mandatory. - Endpoint Connections: Automatic fallback to locational endpoints in certain regions is removed.
- LLM Updates (Gemini Integration): Integrations now default to the
gemini-2.0-flash-001model.PaLM2 support has been removed; please migrate any existing PaLM2 usage to Gemini.Note: The current defaultmodel will be removed in Version 3.0.
Important: If you are not ready to adapt to these changes, please pin your dependency to a version less than 2.0(e.g.,bigframes==1.42.0) to avoid disruption.
To learn about these changes and how to migrate to version 2.0, see theupdated introduction guide.
BigQuery DataFrames is distributed with theApache-2.0 license.
It also contains code derived from the following third-party packages:
For details, see thethird_partydirectory.
For further help and provide feedback, you can email us atbigframes-feedback@google.com.
About
BigQuery DataFrames (also known as BigFrames)
Topics
Resources
License
Code of conduct
Contributing
Security policy
Uh oh!
There was an error while loading.Please reload this page.