Improve model performance with hyperparameter tuning

This tutorial teaches you how to usehyperparameter tuning inBigQuery ML to tune a machine learning model and improve itsperformance.

You perform hyperparameter tuning by specifying theNUM_TRIALS optionof theCREATE MODEL statement, in combination with other model-specificoptions. When you set these options, BigQuery ML trainsmultiple versions, ortrials of the model, each with slightly differentparameters, and returns the trial that performs the best.

This tutorial uses the publictlc_yellow_trips_2018 sample table, which contains information about taxi trip in New York Cityin 2018.

Objectives

This tutorial guides you through completing the following tasks:

  • Using theCREATE MODEL statementto create a baseline linear regression model.
  • Evaluating the baseline model by using theML.EVALUATE function.
  • Using theCREATE MODEL statement with hyperparameter tuning options totrain twenty trials of a linear regression model.
  • Reviewing the trials by using theML.TRIAL_INFO function.
  • Evaluating the trials by using theML.EVALUATE function.
  • Get predictions about taxi trips from the optimal model among the trials byusing theML.PREDICT function.

Costs

This tutorial uses billable components of Google Cloud,including:

  • BigQuery
  • BigQuery ML

For more information about BigQuery costs, see theBigQuery pricing page.

Before you begin

  1. Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.
  2. In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

    Roles required to select or create a project

    • Select a project: Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
    • Create a project: To create a project, you need the Project Creator role (roles/resourcemanager.projectCreator), which contains theresourcemanager.projects.create permission.Learn how to grant roles.
    Note: If you don't plan to keep the resources that you create in this procedure, create a project instead of selecting an existing project. After you finish these steps, you can delete the project, removing all resources associated with the project.

    Go to project selector

  3. Verify that billing is enabled for your Google Cloud project.

  4. In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

    Roles required to select or create a project

    • Select a project: Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
    • Create a project: To create a project, you need the Project Creator role (roles/resourcemanager.projectCreator), which contains theresourcemanager.projects.create permission.Learn how to grant roles.
    Note: If you don't plan to keep the resources that you create in this procedure, create a project instead of selecting an existing project. After you finish these steps, you can delete the project, removing all resources associated with the project.

    Go to project selector

  5. Verify that billing is enabled for your Google Cloud project.

  6. BigQuery is automatically enabled in new projects. To activate BigQuery in a pre-existing project, go to

    Enable the BigQuery API.

    Roles required to enable APIs

    To enable APIs, you need the Service Usage Admin IAM role (roles/serviceusage.serviceUsageAdmin), which contains theserviceusage.services.enable permission.Learn how to grant roles.

    Enable the API

Required permissions

  • To create the dataset, you need thebigquery.datasets.createIAM permission.

  • To create the model, you need the following permissions:

    • bigquery.jobs.create
    • bigquery.models.create
    • bigquery.models.getData
    • bigquery.models.updateData
  • To run inference, you need the following permissions:

    • bigquery.models.getData
    • bigquery.jobs.create

For more information about IAM roles and permissions inBigQuery, seeIntroduction to IAM.

Create a dataset

Create a BigQuery dataset to store your ML model.

Console

  1. In the Google Cloud console, go to theBigQuery page.

    Go to the BigQuery page

  2. In theExplorer pane, click your project name.

  3. ClickView actions > Create dataset

  4. On theCreate dataset page, do the following:

    • ForDataset ID, enterbqml_tutorial.

    • ForLocation type, selectMulti-region, and then selectUS (multiple regions in United States).

    • Leave the remaining default settings as they are, and clickCreate dataset.

bq

To create a new dataset, use thebq mk commandwith the--location flag. For a full list of possible parameters, see thebq mk --dataset commandreference.

  1. Create a dataset namedbqml_tutorial with the data location set toUSand a description ofBigQuery ML tutorial dataset:

    bq --location=US mk -d \ --description "BigQuery ML tutorial dataset." \ bqml_tutorial

    Instead of using the--dataset flag, the command uses the-d shortcut.If you omit-d and--dataset, the command defaults to creating adataset.

  2. Confirm that the dataset was created:

    bqls

API

Call thedatasets.insertmethod with a defineddataset resource.

{"datasetReference":{"datasetId":"bqml_tutorial"}}

BigQuery DataFrames

Before trying this sample, follow the BigQuery DataFrames setup instructions in theBigQuery quickstart using BigQuery DataFrames. For more information, see theBigQuery DataFrames reference documentation.

To authenticate to BigQuery, set up Application Default Credentials. For more information, seeSet up ADC for a local development environment.

importgoogle.cloud.bigquerybqclient=google.cloud.bigquery.Client()bqclient.create_dataset("bqml_tutorial",exists_ok=True)

Create a table of training data

Create a table of training data, based on a subset of thetlc_yellow_trips_2018 table data.

Follow these steps to create the table:

  1. In the Google Cloud console, go to theBigQuery page.

    Go to BigQuery

  2. In the query editor, paste in the following query and clickRun:

    CREATEORREPLACETABLE`bqml_tutorial.taxi_tip_input`ASSELECT*EXCEPT(tip_amount),tip_amountASlabelFROM`bigquery-public-data.new_york_taxi_trips.tlc_yellow_trips_2018`WHEREtip_amountISNOTNULLLIMIT100000;

Create a baseline linear regression model

Create a linear regression model without hyperparameter tuning and train it onthetaxi_tip_input table data.

Follow these steps to create the model:

  1. In the Google Cloud console, go to theBigQuery page.

    Go to BigQuery

  2. In the query editor, paste in the following query and clickRun:

    CREATEORREPLACEMODEL`bqml_tutorial.baseline_taxi_tip_model`OPTIONS(MODEL_TYPE='LINEAR_REG')ASSELECT*FROM`bqml_tutorial.taxi_tip_input`;

    The query takes about 2 minutes to complete.

Evaluate the baseline model

Evaluate the performance of the model by using theML.EVALUATE function.TheML.EVALUATE function evaluates the predicted content ratings returned bythe model against the evaluation metrics calculated during model training.

Follow these steps to evaluate the model:

  1. In the Google Cloud console, go to theBigQuery page.

    Go to BigQuery

  2. In the query editor, paste in the following query and clickRun:

    SELECT*FROMML.EVALUATE(MODEL`bqml_tutorial.baseline_taxi_tip_model`);

    The results look similar to the following:

    +---------------------+--------------------+------------------------+-----------------------+---------------------+---------------------+| mean_absolute_error | mean_squared_error | mean_squared_log_error | median_absolute_error |      r2_score       | explained_variance  |+---------------------+--------------------+------------------------+-----------------------+---------------------+---------------------+|  2.5853895559690323 | 23760.416358496139 |   0.017392406523370374 | 0.0044248227819481123 | -1934.5450533482465 | -1934.3513857946277 |+---------------------+--------------------+------------------------+-----------------------+---------------------+---------------------+

Ther2_score value for the baseline model is negative, which indicates apoor fit for the data; the closer theR2 scoreis to 1, the better the model fit is.

Create a linear regression model with hyperparameter tuning

Create a linear regression model with hyperparameter tuning and train it onthetaxi_tip_input table data.

You use the following hyperparameter tuning options in theCREATE MODELstatement:

  • TheNUM_TRIALS optionto set the number of trials to twenty.
  • TheMAX_PARALLEL_TRIALS optionto run two trials in each training job, for a total of ten jobs and twentytrials. This reduces the training time needed. However, the two concurrenttrials don't benefit from each other's training results.
  • TheL1_REG optionto try different L1 regularization values in the different trials.L1 regularization removes irrelevant features from the model, whichhelps preventoverfitting.

The other hyperparameter tuning options supported by the model use their defaultvalues, as follows:

  • L1_REG:0
  • HPARAM_TUNING_ALGORITHM:'VIZIER_DEFAULT'
  • HPARAM_TUNING_OBJECTIVES:['R2_SCORE']

Follow these steps to create the model:

  1. In the Google Cloud console, go to theBigQuery page.

    Go to BigQuery

  2. In the query editor, paste in the following query and clickRun:

    CREATEORREPLACEMODEL`bqml_tutorial.hp_taxi_tip_model`OPTIONS(MODEL_TYPE='LINEAR_REG',NUM_TRIALS=20,MAX_PARALLEL_TRIALS=2,L1_REG=HPARAM_RANGE(0,5))ASSELECT*FROM`bqml_tutorial.taxi_tip_input`;

    The query takes approximately 20 minutes to complete.

Get information about the training trials

Get information about all of the trials, including their hyperparameter values,objectives, and status, by using theML.TRIAL_INFO function. This functionalso returns information about which trial has the best performance, based onthis information.

Follow these steps to get trial information:

  1. In the Google Cloud console, go to theBigQuery page.

    Go to BigQuery

  2. In the query editor, paste in the following query and clickRun:

    SELECT*FROMML.TRIAL_INFO(MODEL`bqml_tutorial.hp_taxi_tip_model`)ORDERBYis_optimalDESC;

    The results look similar to the following:

    +----------+-------------------------------------+-----------------------------------+--------------------+--------------------+-----------+---------------+------------+| trial_id |           hyperparameters           | hparam_tuning_evaluation_metrics  |   training_loss    |     eval_loss      |  status   | error_message | is_optimal |+----------+-------------------------------------+-----------------------------------+--------------------+--------------------+-----------+---------------+------------+|        7 |      {"l1_reg":"4.999999999999985"} |  {"r2_score":"0.653653627638174"} | 4.4677841296238165 |  4.478469742512195 | SUCCEEDED | NULL          |       true ||        2 |  {"l1_reg":"2.402163664510254E-11"} | {"r2_score":"0.6532493667964732"} |  4.457692508421795 |  4.483697081650438 | SUCCEEDED | NULL          |      false ||        3 |  {"l1_reg":"1.2929452948742316E-7"} |  {"r2_score":"0.653249366811995"} |   4.45769250849513 |  4.483697081449748 | SUCCEEDED | NULL          |      false ||        4 |  {"l1_reg":"2.5787102060628228E-5"} | {"r2_score":"0.6532493698925899"} |  4.457692523040582 |  4.483697041615808 | SUCCEEDED | NULL          |      false ||      ... |                             ...     |                           ...     |              ...   |             ...    |       ... |          ...  |        ... |+----------+-------------------------------------+-----------------------------------+--------------------+--------------------+-----------+---------------+------------+

    Theis_optimal column value indicates that trial 7 is the optimal modelreturned by the tuning.

Evaluate the tuned model trials

Evaluate the performance of the trials by using theML.EVALUATE function.TheML.EVALUATE function evaluates the predicted content ratings returned bythe model against the evaluation metrics calculated during training for alltrials.

Follow these steps to evaluate the model trials:

  1. In the Google Cloud console, go to theBigQuery page.

    Go to BigQuery

  2. In the query editor, paste in the following query and clickRun:

    SELECT*FROMML.EVALUATE(MODEL`bqml_tutorial.hp_taxi_tip_model`)ORDERBYr2_scoreDESC;

    The results look similar to the following:

    +----------+---------------------+--------------------+------------------------+-----------------------+--------------------+--------------------+| trial_id | mean_absolute_error | mean_squared_error | mean_squared_log_error | median_absolute_error |      r2_score      | explained_variance |+----------+---------------------+--------------------+------------------------+-----------------------+--------------------+--------------------+|        7 |   1.151814398002232 |  4.109811493266523 |     0.4918733252641176 |    0.5736103414025084 | 0.6652110305659145 | 0.6652144696114834 ||       19 |  1.1518143358927102 |  4.109811921460791 |     0.4918672150119582 |    0.5736106106914161 | 0.6652109956848206 | 0.6652144346901685 ||        8 |   1.152747850702547 |  4.123625876152422 |     0.4897808307399327 |    0.5731702310239184 | 0.6640856984144734 |  0.664088410199906 ||        5 |   1.152895108945439 |  4.125775524878872 |    0.48939088205957937 |    0.5723300569616766 | 0.6639105860807425 | 0.6639132416838652 ||      ... |                ...  |                ... |                    ... |                   ... |                ... |                ... |+----------+---------------------+--------------------+------------------------+-----------------------+--------------------+--------------------+

    Ther2_score value for the optimal model, which is trial 7, is0.66521103056591446, which shows significant improvement over thebaseline model.

You can evaluate a specific trial by specifying theTRIAL_IDargument in theML.EVALUATE function.

For more information about the difference betweenML.TRIAL_INFOobjectives andML.EVALUATE evaluation metrics, seeModel serving functions.

Use the tuned model to predict taxi tips

Use the optimal model returned by tuning to predict tips for different taxitrips. The optimal model is automatically used by theML.PREDICT function,unless you select a different trial by specifying theTRIAL_ID argument. Thepredictions are returned in thepredicted_label column.

Follow these steps to get predictions:

  1. In the Google Cloud console, go to theBigQuery page.

    Go to BigQuery

  2. In the query editor, paste in the following query and clickRun:

    SELECT*FROMML.PREDICT(MODEL`bqml_tutorial.hp_taxi_tip_model`,(SELECT*FROM`bqml_tutorial.taxi_tip_input`LIMIT5));

    The results look similar to the following:

    +----------+--------------------+-----------+---------------------+---------------------+-----------------+---------------+-----------+--------------------+--------------+-------------+-------+---------+--------------+---------------+--------------+--------------------+---------------------+----------------+-----------------+-------+| trial_id |  predicted_label   | vendor_id |   pickup_datetime   |  dropoff_datetime   | passenger_count | trip_distance | rate_code | store_and_fwd_flag | payment_type | fare_amount | extra | mta_tax | tolls_amount | imp_surcharge | total_amount | pickup_location_id | dropoff_location_id | data_file_year | data_file_month | label |+----------+--------------------+-----------+---------------------+---------------------+-----------------+---------------+-----------+--------------------+--------------+-------------+-------+---------+--------------+---------------+--------------+--------------------+---------------------+----------------+-----------------+-------+|        7 |  1.343367839584448 | 2         | 2018-01-15 18:55:15 | 2018-01-15 18:56:18 |               1 |             0 | 1         | N                  | 1            |           0 |     0 |       0 |            0 |             0 |            0 | 193                | 193                 |           2018 |               1 |     0 ||        7 | -1.176072791783461 | 1         | 2018-01-08 10:26:24 | 2018-01-08 10:26:37 |               1 |             0 | 5         | N                  | 3            |        0.01 |     0 |       0 |            0 |           0.3 |         0.31 | 158                | 158                 |           2018 |               1 |     0 ||        7 |  3.839580104168765 | 1         | 2018-01-22 10:58:02 | 2018-01-22 12:01:11 |               1 |          16.1 | 1         | N                  | 1            |        54.5 |     0 |     0.5 |            0 |           0.3 |         55.3 | 140                | 91                  |           2018 |               1 |     0 ||        7 |  4.677393985230036 | 1         | 2018-01-16 10:14:35 | 2018-01-16 11:07:28 |               1 |            18 | 1         | N                  | 2            |        54.5 |     0 |     0.5 |            0 |           0.3 |         55.3 | 138                | 67                  |           2018 |               1 |     0 ||        7 |  7.938988937253062 | 2         | 2018-01-16 07:05:15 | 2018-01-16 08:06:31 |               1 |          17.8 | 1         | N                  | 1            |        54.5 |     0 |     0.5 |            0 |           0.3 |        66.36 | 132                | 255                 |           2018 |               1 | 11.06 |+----------+--------------------+-----------+---------------------+---------------------+-----------------+---------------+-----------+--------------------+--------------+-------------+-------+---------+--------------+---------------+--------------+--------------------+---------------------+----------------+-----------------+-------+

Clean up

To avoid incurring charges to your Google Cloud account for the resources used in this tutorial, either delete the project that contains the resources, or keep the project and delete the individual resources.

  • You can delete the project you created.
  • Or you can keep the project and delete the dataset.

Delete your dataset

Deleting your project removes all datasets and all tables in the project. If youprefer to reuse the project, you can delete the dataset you created in thistutorial:

  1. If necessary, open the BigQuery page in theGoogle Cloud console.

    Go to the BigQuery page

  2. In the navigation panel, click thebqml_tutorial dataset you created.

  3. On the right side of the window, clickDelete dataset. This actiondeletes the dataset, the table, and all the data.

  4. In theDelete dataset dialog, confirm the delete command by typingthe name of your dataset (bqml_tutorial) and then clickDelete.

Delete your project

To delete the project:

    Caution: Deleting a project has the following effects:
    • Everything in the project is deleted. If you used an existing project for the tasks in this document, when you delete it, you also delete any other work you've done in the project.
    • Custom project IDs are lost. When you created this project, you might have created a custom project ID that you want to use in the future. To preserve the URLs that use the project ID, such as anappspot.com URL, delete selected resources inside the project instead of deleting the whole project.

    If you plan to explore multiple architectures, tutorials, or quickstarts, reusing projects can help you avoid exceeding project quota limits.

  1. In the Google Cloud console, go to theManage resources page.

    Go to Manage resources

  2. In the project list, select the project that you want to delete, and then clickDelete.
  3. In the dialog, type the project ID, and then clickShut down to delete the project.

What's next

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2025-12-15 UTC.