Manipulate data with BigQuery DataFrames
This document describes the data manipulation capabilities available withBigQuery DataFrames. You can find the functions that are described in thebigframes.bigquery library.
Required roles
To get the permissions that you need to complete the tasks in this document, ask your administrator to grant you the following IAM roles on your project:
- BigQuery Job User (
roles/bigquery.jobUser) - BigQuery Read Session User (
roles/bigquery.readSessionUser) - Use BigQuery DataFrames in a BigQuery notebook:
- BigQuery User (
roles/bigquery.user) - Notebook Runtime User (
roles/aiplatform.notebookRuntimeUser) - Code Creator (
roles/dataform.codeCreator)
- BigQuery User (
For more information about granting roles, seeManage access to projects, folders, and organizations.
You might also be able to get the required permissions throughcustom roles or otherpredefined roles.
When you perform end user authentication in an interactiveenvironment like a notebook, Python REPL, or the command line,BigQuery DataFrames prompts for authentication, if needed.Otherwise, seehow to set up application default credentialsfor various environments.
pandas API
A notable feature of BigQuery DataFrames is that thebigframes.pandas APIis designed to be similar to APIs in the pandas library. This design lets youemployfamiliar syntax patterns for data manipulation tasks. Operations defined throughthe BigQuery DataFrames API are executed server-side, operating directlyon data stored within BigQuery and eliminating the need totransfer datasets out of BigQuery.
To check which pandas APIs are supported by BigQuery DataFrames, seeSupported pandas APIs.
Inspect and manipulate data
You can use thebigframes.pandas API to perform data inspection andcalculation operations. The following code sample uses thebigframes.pandaslibrary to inspect thebody_mass_g column, calculate the meanbody_mass, andcalculate the meanbody_mass byspecies:
importbigframes.pandasasbpd# Load data from BigQueryquery_or_table="bigquery-public-data.ml_datasets.penguins"bq_df=bpd.read_gbq(query_or_table)# Inspect one of the columns (or series) of the DataFrame:bq_df["body_mass_g"]# Compute the mean of this series:average_body_mass=bq_df["body_mass_g"].mean()print(f"average_body_mass:{average_body_mass}")# Find the heaviest species using the groupby operation to calculate the# mean body_mass_g:(bq_df["body_mass_g"].groupby(by=bq_df["species"]).mean().sort_values(ascending=False).head(10))BigQuery library
The BigQuery library provides BigQuery SQLfunctions that might not have a pandas equivalent. The following sectionspresent some examples.
Process array values
You can use thebigframes.bigquery.array_agg() function in thebigframes.bigquery library to aggregate values after agroupby operation:
importbigframes.bigqueryasbbqimportbigframes.pandasasbpds=bpd.Series([0,1,2,3,4,5])# Group values by whether they are divisble by 2 and aggregate them into arraysbbq.array_agg(s.groupby(s%2==0))# False [1 3 5]# True [0 2 4]# dtype: list<item: int64>[pyarrow]You can also use thearray_length() andarray_to_string() array functions.
Create a structSeries object
You can use thebigframes.bigquery.struct() function in thebigframes.bigquery library to create a new structSeries object withsubfields for each column in a DataFrame:
importbigframes.bigqueryasbbqimportbigframes.pandasasbpd# Load data from BigQueryquery_or_table="bigquery-public-data.ml_datasets.penguins"bq_df=bpd.read_gbq(query_or_table)# Create a new STRUCT Series with subfields for each column in a DataFrames.lengths=bbq.struct(bq_df[["culmen_length_mm","culmen_depth_mm","flipper_length_mm"]])lengths.peek()# 146{'culmen_length_mm': 51.1, 'culmen_depth_mm': ...# 278{'culmen_length_mm': 48.2, 'culmen_depth_mm': ...# 337{'culmen_length_mm': 36.4, 'culmen_depth_mm': ...# 154{'culmen_length_mm': 46.5, 'culmen_depth_mm': ...# 185{'culmen_length_mm': 50.1, 'culmen_depth_mm': ...# dtype: struct[pyarrow]Convert timestamps to Unix epochs
You can use thebigframes.bigquery.unix_micros() function in thebigframes.bigquery library to convert timestamps into Unix microseconds:
importpandasaspdimportbigframes.bigqueryasbbqimportbigframes.pandasasbpd# Create a series that consists of three timestamps: [1970-01-01, 1970-01-02, 1970-01-03]s=bpd.Series(pd.date_range("1970-01-01",periods=3,freq="d",tz="UTC"))bbq.unix_micros(s)# 0 0# 1 86400000000# 2 172800000000# dtype: Int64You can also use theunix_seconds() andunix_millis() time functions.
Use the SQL scalar function
You can use thebigframes.bigquery.sql_scalar() function in thebigframes.bigquery library to access arbitrary SQL syntax representing asingle-column expression:
importbigframes.bigqueryasbbqimportbigframes.pandasasbpd# Load data from BigQueryquery_or_table="bigquery-public-data.ml_datasets.penguins"# The sql_scalar function can be used to inject SQL syntax that is not supported# or difficult to express with the bigframes.pandas APIs.bq_df=bpd.read_gbq(query_or_table)shortest=bbq.sql_scalar("LEAST({0},{1},{2})",columns=[bq_df["culmen_depth_mm"],bq_df["culmen_length_mm"],bq_df["flipper_length_mm"],],)shortest.peek()# 0# 14918.9# 3316.3# 29617.2# 28717.0# 30715.0# dtype: Float64What's next
- Learn aboutcustom Python functionsfor BigQuery DataFrames.
- Learn how togenerate BigQuery DataFrames code with Gemini.
- Learn how toanalyze package downloads from PyPI with BigQuery DataFrames.
- View BigQuery DataFramessource code,sample notebooks, andsampleson GitHub.
- Explore theBigQuery DataFrames API reference.
Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2026-02-18 UTC.