Limit forecasted values for an ARIMA_PLUS time series model

This tutorial teaches you how to use limits to narrow the forecasted results returned by anARIMA_PLUS time series model. In this tutorial, you create two time series models over the same data, one model which uses limits and one model that doesn't use limits. This lets you compare the results returned by the modelsand understand the difference that specifying limits makes.

You use thenew_york.citibike_trips data to train the models in this tutorial. This dataset contains information about Citi Bike trips in New York City.

Before following this tutorial, you should be familiar with single time series forecasting. Complete theSingle time series forecasting from Google Analytics data tutorial for an introduction to this topic.

Required Permissions

  • To create the dataset, you need thebigquery.datasets.createIAM permission.

  • To create the model, you need the following permissions:

    • bigquery.jobs.create
    • bigquery.models.create
    • bigquery.models.getData
    • bigquery.models.updateData
  • To run inference, you need the following permissions:

    • bigquery.models.getData
    • bigquery.jobs.create

For more information about IAM roles and permissions inBigQuery, seeIntroduction to IAM.

Objectives

In this tutorial, you use the following:

Costs

This tutorial uses billable components of Google Cloud, including the following:

  • BigQuery
  • BigQuery ML

For more information about BigQuery costs, see theBigQuery pricing page.

For more information about BigQuery ML costs, seeBigQuery ML pricing.

Before you begin

  1. Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.
  2. In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

    Roles required to select or create a project

    • Select a project: Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
    • Create a project: To create a project, you need the Project Creator role (roles/resourcemanager.projectCreator), which contains theresourcemanager.projects.create permission.Learn how to grant roles.
    Note: If you don't plan to keep the resources that you create in this procedure, create a project instead of selecting an existing project. After you finish these steps, you can delete the project, removing all resources associated with the project.

    Go to project selector

  3. Verify that billing is enabled for your Google Cloud project.

  4. Enable the BigQuery API.

    Roles required to enable APIs

    To enable APIs, you need the Service Usage Admin IAM role (roles/serviceusage.serviceUsageAdmin), which contains theserviceusage.services.enable permission.Learn how to grant roles.

    Enable the API

  5. In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

    Roles required to select or create a project

    • Select a project: Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
    • Create a project: To create a project, you need the Project Creator role (roles/resourcemanager.projectCreator), which contains theresourcemanager.projects.create permission.Learn how to grant roles.
    Note: If you don't plan to keep the resources that you create in this procedure, create a project instead of selecting an existing project. After you finish these steps, you can delete the project, removing all resources associated with the project.

    Go to project selector

  6. Verify that billing is enabled for your Google Cloud project.

  7. Enable the BigQuery API.

    Roles required to enable APIs

    To enable APIs, you need the Service Usage Admin IAM role (roles/serviceusage.serviceUsageAdmin), which contains theserviceusage.services.enable permission.Learn how to grant roles.

    Enable the API

Create a dataset

Create a BigQuery dataset to store your ML model.

Console

  1. In the Google Cloud console, go to theBigQuery page.

    Go to the BigQuery page

  2. In theExplorer pane, click your project name.

  3. ClickView actions > Create dataset

  4. On theCreate dataset page, do the following:

    • ForDataset ID, enterbqml_tutorial.

    • ForLocation type, selectMulti-region, and then selectUS (multiple regions in United States).

    • Leave the remaining default settings as they are, and clickCreate dataset.

bq

To create a new dataset, use thebq mk commandwith the--location flag. For a full list of possible parameters, see thebq mk --dataset commandreference.

  1. Create a dataset namedbqml_tutorial with the data location set toUSand a description ofBigQuery ML tutorial dataset:

    bq --location=US mk -d \ --description "BigQuery ML tutorial dataset." \ bqml_tutorial

    Instead of using the--dataset flag, the command uses the-d shortcut.If you omit-d and--dataset, the command defaults to creating adataset.

  2. Confirm that the dataset was created:

    bqls

API

Call thedatasets.insertmethod with a defineddataset resource.

{"datasetReference":{"datasetId":"bqml_tutorial"}}

BigQuery DataFrames

Before trying this sample, follow the BigQuery DataFrames setup instructions in theBigQuery quickstart using BigQuery DataFrames. For more information, see theBigQuery DataFrames reference documentation.

To authenticate to BigQuery, set up Application Default Credentials. For more information, seeSet up ADC for a local development environment.

importgoogle.cloud.bigquerybqclient=google.cloud.bigquery.Client()bqclient.create_dataset("bqml_tutorial",exists_ok=True)

Visualize the time series you want to forecast

Before creating the model, it is useful to see what your input time serieslooks like.

SQL

In the following query, theFROM bigquery-public-data.new_york.citibike_tripsclause indicates that you are querying thecitibike_trips table in thenew_yorkdataset.

In theSELECT statement, the query uses theEXTRACT functionto extract the date information from thestarttime column. The query usestheCOUNT(*) clause to get the daily total number of Citi Bike trips.

#standardSQLSELECTEXTRACT(DATEfromstarttime)ASdate,COUNT(*)ASnum_tripsFROM`bigquery-public-data`.new_york.citibike_tripsGROUPBYdate

To run the query, use the following steps:

  1. In the Google Cloud console, click theCompose new query button.

  2. Enter the following GoogleSQL query in the query editor.

    #standardSQLSELECTEXTRACT(DATEfromstarttime)ASdate,COUNT(*)ASnum_tripsFROM`bigquery-public-data`.new_york.citibike_tripsGROUPBYdate
  3. ClickRun. The query results similar to the following.

    Query output.

  4. Use the Google Cloud console to chart the time series data. In theQuery results pane, click theVisualization tab. In theVisualization configuration pane, chooseBar for theVisualization type:

    Result_visualization.

BigQuery DataFrames

Before trying this sample, follow the BigQuery DataFrames setup instructions in theBigQuery quickstart using BigQuery DataFrames. For more information, see theBigQuery DataFrames reference documentation.

To authenticate to BigQuery, set up Application Default Credentials. For more information, seeSet up ADC for a local development environment.

In the following sample,bigquery-public-data.new_york.citibike_tripsindicates that you are querying thecitibike_trips table in thenew_yorkdataset.

importbigframes.pandasasbpddf=bpd.read_gbq("bigquery-public-data.new_york.citibike_trips")features=bpd.DataFrame({"num_trips":df.starttime,"date":df["starttime"].dt.date,})num_trips=features.groupby(["date"]).count()num_trips.plot.line()

The result is similar to the following:Result_visualization

Create a time series model

Create a time series model, using the NYC Citi Bike trips data.

The following GoogleSQL query creates a model that forecasts daily totalbike trips. TheCREATE MODELstatement creates and trains a model namedbqml_tutorial.nyc_citibike_arima_model.

#standardSQLCREATEORREPLACEMODELbqml_tutorial.nyc_citibike_arima_modelOPTIONS(model_type='ARIMA_PLUS',time_series_timestamp_col='date',time_series_data_col='num_trips',time_series_id_col='start_station_id')ASSELECTEXTRACT(DATEFROMstarttime)ASdate,COUNT(*)ASnum_trips,start_station_idFROM`bigquery-public-data`.new_york.citibike_tripsWHEREstarttime>'2014-07-11'ANDstarttime<'2015-02-11'GROUPBYdate,start_station_id;

TheOPTIONS(model_type='ARIMA_PLUS', time_series_timestamp_col='date', ...)clause indicates that you are creating anARIMA-basedtime series model. By default,auto_arima=TRUE,so theauto.ARIMA algorithm automatically tunes the hyperparameters inARIMA_PLUS models. The algorithm fits dozens of candidate models and choosesthe best one with the lowestAkaike information criterion (AIC).Additionally, because the default isdata_frequency='AUTO_FREQUENCY', the training process automatically infersthe data frequency of the input time series. TheCREATE MODEL statement usesdecompose_time_series=TRUEby default, so both the history and forecast parts of the time series are saved in the model. Setting the parametertime_series_id_col = 'start_station_id' causes the model to fit and forecast multiple time series using a single query based on thestart_station_id. You can use this information to further understand how the time series is forecastedby fetching the separate time series components such as seasonal periods.

Run theCREATE MODEL query to create and train your model:

  1. In the Google Cloud console, click theCompose new query button.

  2. Enter the following GoogleSQL query in the query editor.

    #standardSQLCREATEORREPLACEMODELbqml_tutorial.nyc_citibike_arima_modelOPTIONS(model_type='ARIMA_PLUS',time_series_timestamp_col='date',time_series_data_col='num_trips',time_series_id_col='start_station_id')ASSELECTEXTRACT(DATEFROMstarttime)ASdate,COUNT(*)ASnum_trips,start_station_idFROM`bigquery-public-data`.new_york.citibike_tripsWHEREstarttime>'2014-07-11'ANDstarttime<'2015-02-11'GROUPBYdate,start_station_id;
  3. ClickRun.

    The query takes approximately 80 seconds to complete, after which you canaccess the (nyc_citibike_arima_model) model. Because the query uses aCREATE MODEL statement to create a model, there are no query results.

Note: You might wonder if United States holidays have an impact on the timeseries. You can try addingholiday_region='US'to theOPTIONS list in the query. This allows a more accurate modeling onthose United States holidays time points if there are indeed United States holiday patterns in the timeseries.

Forecast the time series and visualize the results

To explain how the time series is forecasted, visualize all the sub-time seriescomponents, such as seasonality and trend, using theML.FORECASTfunction.

To do this, follow these steps:

  1. In the Google Cloud console, click theCompose new query button.

  2. Enter the following GoogleSQL query in the query editor.

    #standardSQLSELECTforecast_timestampASforecast_timestamp,start_station_idASstart_station_id,history_valueAShistory_value,forecast_valueASforecast_valueFROM((SELECTDATE(forecast_timestamp)ASforecast_timestamp,NULLAShistory_value,forecast_valueASforecast_value,start_station_idASstart_station_id,FROMML.FORECAST(MODELbqml_tutorial.`nyc_citibike_arima_model`,STRUCT(365AShorizon,0.9ASconfidence_level)))UNIONALL(SELECTDATE(date_name)ASforecast_timestamp,num_tripsAShistory_value,NULLASforecast_value,start_station_idASstart_station_id,FROM(SELECTEXTRACT(DATEFROMstarttime)ASdate_name,COUNT(*)ASnum_trips,start_station_idASstart_station_idFROM`bigquery-public-data`.new_york.citibike_tripsWHEREstarttime>'2014-07-11'ANDstarttime<'2015-02-11'GROUPBYdate_name,start_station_id)))WHEREstart_station_id=79ORDERBYforecast_timestamp,start_station_id
  3. ClickRun. The query results similar to the following:

    BQUI_chart.

  4. Use the Google Cloud console to chart the time series data. In theQuery results pane, click theVisualization tab:

    Result_visualization.

The chart shows that the forecasted values for the daily total number of CitiBike trips wherestart_station_id=79 are negative numbers, which isn't useful. Using a model with limits instead improves the forecasted data.

Create a time series model with limits

Create a time series model with limits, using the NYC Citi Bike trips data.

The following GoogleSQL query creates a model that forecasts daily totalbike trips. TheCREATE MODELstatement creates and trains a model namedbqml_tutorial.nyc_citibike_arima_model_with_limits.The key difference between this model and themodel you created previously is the addition of theforecast_limit_lower_bound=0 option. This option causes the model to only forecast values that are greater than 0, based on the values in the column specified by thetime_series_data_col argument, in this casenum_trips.

#standardSQLCREATEORREPLACEMODELbqml_tutorial.nyc_citibike_arima_modelOPTIONS(model_type='ARIMA_PLUS',time_series_timestamp_col='date',time_series_data_col='num_trips',time_series_id_col='start_station_id',forecast_limit_lower_bound=0)ASSELECTEXTRACT(DATEFROMstarttime)ASdate,COUNT(*)ASnum_trips,start_station_idFROM`bigquery-public-data`.new_york.citibike_tripsWHEREstarttime>'2014-07-11'ANDstarttime<'2015-02-11'GROUPBYdate,start_station_id;

Run theCREATE MODEL query to create and train your model:

  1. In the Google Cloud console, click theCompose new query button.

  2. Enter the following GoogleSQL query in the query editor.

    #standardSQLCREATEORREPLACEMODELbqml_tutorial.nyc_citibike_arima_modelOPTIONS(model_type='ARIMA_PLUS',time_series_timestamp_col='date',time_series_data_col='num_trips',time_series_id_col='start_station_id',forecast_limit_lower_bound=0)ASSELECTEXTRACT(DATEFROMstarttime)ASdate,COUNT(*)ASnum_trips,start_station_idFROM`bigquery-public-data`.new_york.citibike_tripsWHEREstarttime>'2014-07-11'ANDstarttime<'2015-02-11'GROUPBYdate,start_station_id;
  3. ClickRun.

    The query takes approximately 100 seconds to complete, after which you canaccess the (nyc_citibike_arima_model_with_limits) model. Because the queryuses aCREATE MODEL statement to create a model, there are no query results.

Note: You might wonder if United States holidays have an impact on the timeseries. You can try addingholiday_region='US'to theOPTIONS list in the query. This allows a more accurate modeling onthose United States holidays time points if there are indeed United States holiday patterns in the timeseries.

Forecast the time series by using the model with limits

  1. In the Google Cloud console, click theCompose new query button.

  2. Enter the following GoogleSQL query in the query editor.

    #standardSQLSELECTforecast_timestampASforecast_timestamp,start_station_idASstart_station_id,history_valueAShistory_value,forecast_valueASforecast_valueFROM((SELECTDATE(forecast_timestamp)ASforecast_timestamp,NULLAShistory_value,forecast_valueASforecast_value,start_station_idASstart_station_id,FROMML.FORECAST(MODELbqml_tutorial.`nyc_citibike_arima_model`,STRUCT(365AShorizon,0.9ASconfidence_level)))UNIONALL(SELECTDATE(date_name)ASforecast_timestamp,num_tripsAShistory_value,NULLASforecast_value,start_station_idASstart_station_id,FROM(SELECTEXTRACT(DATEFROMstarttime)ASdate_name,COUNT(*)ASnum_trips,start_station_idASstart_station_idFROM`bigquery-public-data`.new_york.citibike_tripsWHEREstarttime>'2014-07-11'ANDstarttime<'2015-02-11'GROUPBYdate_name,start_station_id)))WHEREstart_station_id=79ORDERBYforecast_timestamp,start_station_id
  3. ClickRun.

    BQUI_chart.

  4. Use the Google Cloud console to chart the time series data. In theQuery results pane, click theVisualization tab:

    Result_visualization.

The ARIMA PLUS model detects that the daily total number of Citi Bike trips wherestart_station_id=79 is decreasing. Future forecasting values will follow this trend and give relatively smaller forecasting numbers the farther into the future you go. The chart shows that the forecasted values for the daily total number of CitiBike trips wherestart_station_id=79 are positive numbers, which is more useful. The model with limitsdetects that the daily total number of Citi Bike trips wherestart_station_id=79 is decreasing, but it still gives meaningful forecasting values.

As this tutorial shows, theforecast_limit_lower_bound andforecast_limit_upper_bound options can help you get more meaningful forecasting values in similar scenarios to the one shown here, such as when forecasting stock prices or future sales numbers.

Delete your dataset

Deleting your project removes all datasets and all tables in the project. If youprefer to reuse the project, you can delete the dataset you created in thistutorial:

  1. If necessary, open the BigQuery page in theGoogle Cloud console.

    Go to the BigQuery page

  2. In the navigation, click thebqml_tutorial dataset you created.

  3. ClickDelete dataset on the right side of the window.This action deletes the dataset, the table, and all the data.

  4. In theDelete dataset dialog, confirm the delete command by typingthe name of your dataset (bqml_tutorial) and then clickDelete.

Delete your project

To delete the project:

    Caution: Deleting a project has the following effects:
    • Everything in the project is deleted. If you used an existing project for the tasks in this document, when you delete it, you also delete any other work you've done in the project.
    • Custom project IDs are lost. When you created this project, you might have created a custom project ID that you want to use in the future. To preserve the URLs that use the project ID, such as anappspot.com URL, delete selected resources inside the project instead of deleting the whole project.

    If you plan to explore multiple architectures, tutorials, or quickstarts, reusing projects can help you avoid exceeding project quota limits.

  1. In the Google Cloud console, go to theManage resources page.

    Go to Manage resources

  2. In the project list, select the project that you want to delete, and then clickDelete.
  3. In the dialog, type the project ID, and then clickShut down to delete the project.

What's next

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2025-12-15 UTC.