Try BigQuery DataFrames
Use this quickstart to perform the following analysis and machine learning (ML)tasks by using theBigQuery DataFrames API in aBigQuery notebook:
- Create a DataFrame over the
bigquery-public-data.ml_datasets.penguinspublic dataset. - Calculate the average body mass of a penguin.
- Create alinear regression model.
- Create a DataFrame over a subset of the penguin data to use as training data.
- Clean up the training data.
- Set the model parameters.
- Fit the model.
- Score the model.
Before you begin
- Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.
In the Google Cloud console, on the project selector page, select or create a Google Cloud project.
Note: If you don't plan to keep the resources that you create in this procedure, create a project instead of selecting an existing project. After you finish these steps, you can delete the project, removing all resources associated with the project.Roles required to select or create a project
- Select a project: Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
- Create a project: To create a project, you need the Project Creator role (
roles/resourcemanager.projectCreator), which contains theresourcemanager.projects.createpermission.Learn how to grant roles.
In the Google Cloud console, on the project selector page, select or create a Google Cloud project.
Note: If you don't plan to keep the resources that you create in this procedure, create a project instead of selecting an existing project. After you finish these steps, you can delete the project, removing all resources associated with the project.Roles required to select or create a project
- Select a project: Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
- Create a project: To create a project, you need the Project Creator role (
roles/resourcemanager.projectCreator), which contains theresourcemanager.projects.createpermission.Learn how to grant roles.
Verify that billing is enabled for your Google Cloud project.
Verify that the BigQuery API is enabled.
If you created a new project, the BigQuery API is automatically enabled.
Required permissions
To create and run notebooks, you need the following Identity and Access Management (IAM)roles:
- BigQuery User (
roles/bigquery.user) - Notebook Runtime User (
roles/aiplatform.notebookRuntimeUser) - Code Creator (
roles/dataform.codeCreator)
Create a notebook
Follow the instructions inCreate a notebook from the BigQuery editor to create a new notebook.
Try BigQuery DataFrames
Try BigQuery DataFrames by following these steps:
- Create a new code cell in the notebook.
Add the following code to the code cell:
importbigframes.pandasasbpd# Set BigQuery DataFrames options# Note: The project option is not required in all environments.# On BigQuery Studio, the project ID is automatically detected.bpd.options.bigquery.project=your_gcp_project_id# Use "partial" ordering mode to generate more efficient queries, but the# order of the rows in DataFrames may not be deterministic if you have not# explictly sorted it. Some operations that depend on the order, such as# head() will not function until you explictly order the DataFrame. Set the# ordering mode to "strict" (default) for more pandas compatibility.bpd.options.bigquery.ordering_mode="partial"# Create a DataFrame from a BigQuery tablequery_or_table="bigquery-public-data.ml_datasets.penguins"df=bpd.read_gbq(query_or_table)# Efficiently preview the results using the .peek() method.df.peek()Modify the
bpd.options.bigquery.project = your_gcp_project_idline tospecify your Google Cloud project ID. For example,bpd.options.bigquery.project = "myProjectID".Run the code cell.
The code returns a
DataFrameobject with data about penguins.Create a new code cell in the notebook and add the following code:
# Use the DataFrame just as you would a pandas DataFrame, but calculations# happen in the BigQuery query engine instead of the local system.average_body_mass=df["body_mass_g"].mean()print(f"average_body_mass:{average_body_mass}")Run the code cell.
The code calculates the average body mass of the penguins and prints it to theGoogle Cloud console.
Create a new code cell in the notebook and add the following code:
# Create the Linear Regression modelfrombigframes.ml.linear_modelimportLinearRegression# Filter down to the data we want to analyzeadelie_data=df[df.species=="Adelie Penguin (Pygoscelis adeliae)"]# Drop the columns we don't care aboutadelie_data=adelie_data.drop(columns=["species"])# Drop rows with nulls to get our training datatraining_data=adelie_data.dropna()# Pick feature columns and label columnX=training_data[["island","culmen_length_mm","culmen_depth_mm","flipper_length_mm","sex",]]y=training_data[["body_mass_g"]]model=LinearRegression(fit_intercept=False)model.fit(X,y)model.score(X,y)Run the code cell.
The code returns the model's evaluation metrics.
Clean up
The easiest way to eliminate billing is to delete the project that you created for the tutorial.
To delete the project:
What's next
- Continue learning how touse BigQuery DataFrames.
- Learn how tovisualize graphs using BigQuery DataFrames.
- Learn how touse a BigQuery DataFrames notebook.
Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2025-12-15 UTC.