Limit forecasted values for an ARIMA_PLUS time series model
This tutorial teaches you how to use limits to narrow the forecasted results returned by anARIMA_PLUS time series model. In this tutorial, you create two time series models over the same data, one model which uses limits and one model that doesn't use limits. This lets you compare the results returned by the modelsand understand the difference that specifying limits makes.
You use thenew_york.citibike_trips data to train the models in this tutorial. This dataset contains information about Citi Bike trips in New York City.
Before following this tutorial, you should be familiar with single time series forecasting. Complete theSingle time series forecasting from Google Analytics data tutorial for an introduction to this topic.
Required Permissions
To create the dataset, you need the
bigquery.datasets.createIAM permission.To create the model, you need the following permissions:
bigquery.jobs.createbigquery.models.createbigquery.models.getDatabigquery.models.updateData
To run inference, you need the following permissions:
bigquery.models.getDatabigquery.jobs.create
For more information about IAM roles and permissions inBigQuery, seeIntroduction to IAM.
Objectives
In this tutorial, you use the following:
- The
CREATE MODELstatement: to create a time series model. - The
ML.FORECASTfunction: to forecast daily total visits.
Costs
This tutorial uses billable components of Google Cloud, including the following:
- BigQuery
- BigQuery ML
For more information about BigQuery costs, see theBigQuery pricing page.
For more information about BigQuery ML costs, seeBigQuery ML pricing.
Before you begin
- Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.
In the Google Cloud console, on the project selector page, select or create a Google Cloud project.
Note: If you don't plan to keep the resources that you create in this procedure, create a project instead of selecting an existing project. After you finish these steps, you can delete the project, removing all resources associated with the project.Roles required to select or create a project
- Select a project: Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
- Create a project: To create a project, you need the Project Creator role (
roles/resourcemanager.projectCreator), which contains theresourcemanager.projects.createpermission.Learn how to grant roles.
Verify that billing is enabled for your Google Cloud project.
Enable the BigQuery API.
Roles required to enable APIs
To enable APIs, you need the Service Usage Admin IAM role (
roles/serviceusage.serviceUsageAdmin), which contains theserviceusage.services.enablepermission.Learn how to grant roles.In the Google Cloud console, on the project selector page, select or create a Google Cloud project.
Note: If you don't plan to keep the resources that you create in this procedure, create a project instead of selecting an existing project. After you finish these steps, you can delete the project, removing all resources associated with the project.Roles required to select or create a project
- Select a project: Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
- Create a project: To create a project, you need the Project Creator role (
roles/resourcemanager.projectCreator), which contains theresourcemanager.projects.createpermission.Learn how to grant roles.
Verify that billing is enabled for your Google Cloud project.
Enable the BigQuery API.
Roles required to enable APIs
To enable APIs, you need the Service Usage Admin IAM role (
roles/serviceusage.serviceUsageAdmin), which contains theserviceusage.services.enablepermission.Learn how to grant roles.
Create a dataset
Create a BigQuery dataset to store your ML model.
Console
In the Google Cloud console, go to theBigQuery page.
In theExplorer pane, click your project name.
ClickView actions > Create dataset
On theCreate dataset page, do the following:
ForDataset ID, enter
bqml_tutorial.ForLocation type, selectMulti-region, and then selectUS (multiple regions in United States).
Leave the remaining default settings as they are, and clickCreate dataset.
bq
To create a new dataset, use thebq mk commandwith the--location flag. For a full list of possible parameters, see thebq mk --dataset commandreference.
Create a dataset named
bqml_tutorialwith the data location set toUSand a description ofBigQuery ML tutorial dataset:bq --location=US mk -d \ --description "BigQuery ML tutorial dataset." \ bqml_tutorial
Instead of using the
--datasetflag, the command uses the-dshortcut.If you omit-dand--dataset, the command defaults to creating adataset.Confirm that the dataset was created:
bqls
API
Call thedatasets.insertmethod with a defineddataset resource.
{"datasetReference":{"datasetId":"bqml_tutorial"}}
BigQuery DataFrames
Before trying this sample, follow the BigQuery DataFrames setup instructions in theBigQuery quickstart using BigQuery DataFrames. For more information, see theBigQuery DataFrames reference documentation.
To authenticate to BigQuery, set up Application Default Credentials. For more information, seeSet up ADC for a local development environment.
importgoogle.cloud.bigquerybqclient=google.cloud.bigquery.Client()bqclient.create_dataset("bqml_tutorial",exists_ok=True)Visualize the time series you want to forecast
Before creating the model, it is useful to see what your input time serieslooks like.
SQL
In the following query, theFROM bigquery-public-data.new_york.citibike_tripsclause indicates that you are querying thecitibike_trips table in thenew_yorkdataset.
In theSELECT statement, the query uses theEXTRACT functionto extract the date information from thestarttime column. The query usestheCOUNT(*) clause to get the daily total number of Citi Bike trips.
#standardSQLSELECTEXTRACT(DATEfromstarttime)ASdate,COUNT(*)ASnum_tripsFROM`bigquery-public-data`.new_york.citibike_tripsGROUPBYdate
To run the query, use the following steps:
In the Google Cloud console, click theCompose new query button.
Enter the following GoogleSQL query in the query editor.
#standardSQLSELECTEXTRACT(DATEfromstarttime)ASdate,COUNT(*)ASnum_tripsFROM`bigquery-public-data`.new_york.citibike_tripsGROUPBYdate
ClickRun. The query results similar to the following.

Use the Google Cloud console to chart the time series data. In theQuery results pane, click theVisualization tab. In theVisualization configuration pane, chooseBar for theVisualization type:

BigQuery DataFrames
Before trying this sample, follow the BigQuery DataFrames setup instructions in theBigQuery quickstart using BigQuery DataFrames. For more information, see theBigQuery DataFrames reference documentation.
To authenticate to BigQuery, set up Application Default Credentials. For more information, seeSet up ADC for a local development environment.
In the following sample,bigquery-public-data.new_york.citibike_tripsindicates that you are querying thecitibike_trips table in thenew_yorkdataset.
importbigframes.pandasasbpddf=bpd.read_gbq("bigquery-public-data.new_york.citibike_trips")features=bpd.DataFrame({"num_trips":df.starttime,"date":df["starttime"].dt.date,})num_trips=features.groupby(["date"]).count()num_trips.plot.line()The result is similar to the following:
Create a time series model
Create a time series model, using the NYC Citi Bike trips data.
The following GoogleSQL query creates a model that forecasts daily totalbike trips. TheCREATE MODELstatement creates and trains a model namedbqml_tutorial.nyc_citibike_arima_model.
#standardSQLCREATEORREPLACEMODELbqml_tutorial.nyc_citibike_arima_modelOPTIONS(model_type='ARIMA_PLUS',time_series_timestamp_col='date',time_series_data_col='num_trips',time_series_id_col='start_station_id')ASSELECTEXTRACT(DATEFROMstarttime)ASdate,COUNT(*)ASnum_trips,start_station_idFROM`bigquery-public-data`.new_york.citibike_tripsWHEREstarttime>'2014-07-11'ANDstarttime<'2015-02-11'GROUPBYdate,start_station_id;
TheOPTIONS(model_type='ARIMA_PLUS', time_series_timestamp_col='date', ...)clause indicates that you are creating anARIMA-basedtime series model. By default,auto_arima=TRUE,so theauto.ARIMA algorithm automatically tunes the hyperparameters inARIMA_PLUS models. The algorithm fits dozens of candidate models and choosesthe best one with the lowestAkaike information criterion (AIC).Additionally, because the default isdata_frequency='AUTO_FREQUENCY', the training process automatically infersthe data frequency of the input time series. TheCREATE MODEL statement usesdecompose_time_series=TRUEby default, so both the history and forecast parts of the time series are saved in the model. Setting the parametertime_series_id_col = 'start_station_id' causes the model to fit and forecast multiple time series using a single query based on thestart_station_id. You can use this information to further understand how the time series is forecastedby fetching the separate time series components such as seasonal periods.
Run theCREATE MODEL query to create and train your model:
In the Google Cloud console, click theCompose new query button.
Enter the following GoogleSQL query in the query editor.
#standardSQLCREATEORREPLACEMODELbqml_tutorial.nyc_citibike_arima_modelOPTIONS(model_type='ARIMA_PLUS',time_series_timestamp_col='date',time_series_data_col='num_trips',time_series_id_col='start_station_id')ASSELECTEXTRACT(DATEFROMstarttime)ASdate,COUNT(*)ASnum_trips,start_station_idFROM`bigquery-public-data`.new_york.citibike_tripsWHEREstarttime>'2014-07-11'ANDstarttime<'2015-02-11'GROUPBYdate,start_station_id;
ClickRun.
The query takes approximately 80 seconds to complete, after which you canaccess the (
nyc_citibike_arima_model) model. Because the query uses aCREATE MODELstatement to create a model, there are no query results.
OPTIONS list in the query. This allows a more accurate modeling onthose United States holidays time points if there are indeed United States holiday patterns in the timeseries.Forecast the time series and visualize the results
To explain how the time series is forecasted, visualize all the sub-time seriescomponents, such as seasonality and trend, using theML.FORECASTfunction.
To do this, follow these steps:
In the Google Cloud console, click theCompose new query button.
Enter the following GoogleSQL query in the query editor.
#standardSQLSELECTforecast_timestampASforecast_timestamp,start_station_idASstart_station_id,history_valueAShistory_value,forecast_valueASforecast_valueFROM((SELECTDATE(forecast_timestamp)ASforecast_timestamp,NULLAShistory_value,forecast_valueASforecast_value,start_station_idASstart_station_id,FROMML.FORECAST(MODELbqml_tutorial.`nyc_citibike_arima_model`,STRUCT(365AShorizon,0.9ASconfidence_level)))UNIONALL(SELECTDATE(date_name)ASforecast_timestamp,num_tripsAShistory_value,NULLASforecast_value,start_station_idASstart_station_id,FROM(SELECTEXTRACT(DATEFROMstarttime)ASdate_name,COUNT(*)ASnum_trips,start_station_idASstart_station_idFROM`bigquery-public-data`.new_york.citibike_tripsWHEREstarttime>'2014-07-11'ANDstarttime<'2015-02-11'GROUPBYdate_name,start_station_id)))WHEREstart_station_id=79ORDERBYforecast_timestamp,start_station_id
ClickRun. The query results similar to the following:

Use the Google Cloud console to chart the time series data. In theQuery results pane, click theVisualization tab:

The chart shows that the forecasted values for the daily total number of CitiBike trips wherestart_station_id=79 are negative numbers, which isn't useful. Using a model with limits instead improves the forecasted data.
Create a time series model with limits
Create a time series model with limits, using the NYC Citi Bike trips data.
The following GoogleSQL query creates a model that forecasts daily totalbike trips. TheCREATE MODELstatement creates and trains a model namedbqml_tutorial.nyc_citibike_arima_model_with_limits.The key difference between this model and themodel you created previously is the addition of theforecast_limit_lower_bound=0 option. This option causes the model to only forecast values that are greater than 0, based on the values in the column specified by thetime_series_data_col argument, in this casenum_trips.
#standardSQLCREATEORREPLACEMODELbqml_tutorial.nyc_citibike_arima_modelOPTIONS(model_type='ARIMA_PLUS',time_series_timestamp_col='date',time_series_data_col='num_trips',time_series_id_col='start_station_id',forecast_limit_lower_bound=0)ASSELECTEXTRACT(DATEFROMstarttime)ASdate,COUNT(*)ASnum_trips,start_station_idFROM`bigquery-public-data`.new_york.citibike_tripsWHEREstarttime>'2014-07-11'ANDstarttime<'2015-02-11'GROUPBYdate,start_station_id;
Run theCREATE MODEL query to create and train your model:
In the Google Cloud console, click theCompose new query button.
Enter the following GoogleSQL query in the query editor.
#standardSQLCREATEORREPLACEMODELbqml_tutorial.nyc_citibike_arima_modelOPTIONS(model_type='ARIMA_PLUS',time_series_timestamp_col='date',time_series_data_col='num_trips',time_series_id_col='start_station_id',forecast_limit_lower_bound=0)ASSELECTEXTRACT(DATEFROMstarttime)ASdate,COUNT(*)ASnum_trips,start_station_idFROM`bigquery-public-data`.new_york.citibike_tripsWHEREstarttime>'2014-07-11'ANDstarttime<'2015-02-11'GROUPBYdate,start_station_id;
ClickRun.
The query takes approximately 100 seconds to complete, after which you canaccess the (
nyc_citibike_arima_model_with_limits) model. Because the queryuses aCREATE MODELstatement to create a model, there are no query results.
OPTIONS list in the query. This allows a more accurate modeling onthose United States holidays time points if there are indeed United States holiday patterns in the timeseries.Forecast the time series by using the model with limits
In the Google Cloud console, click theCompose new query button.
Enter the following GoogleSQL query in the query editor.
#standardSQLSELECTforecast_timestampASforecast_timestamp,start_station_idASstart_station_id,history_valueAShistory_value,forecast_valueASforecast_valueFROM((SELECTDATE(forecast_timestamp)ASforecast_timestamp,NULLAShistory_value,forecast_valueASforecast_value,start_station_idASstart_station_id,FROMML.FORECAST(MODELbqml_tutorial.`nyc_citibike_arima_model`,STRUCT(365AShorizon,0.9ASconfidence_level)))UNIONALL(SELECTDATE(date_name)ASforecast_timestamp,num_tripsAShistory_value,NULLASforecast_value,start_station_idASstart_station_id,FROM(SELECTEXTRACT(DATEFROMstarttime)ASdate_name,COUNT(*)ASnum_trips,start_station_idASstart_station_idFROM`bigquery-public-data`.new_york.citibike_tripsWHEREstarttime>'2014-07-11'ANDstarttime<'2015-02-11'GROUPBYdate_name,start_station_id)))WHEREstart_station_id=79ORDERBYforecast_timestamp,start_station_id
ClickRun.

Use the Google Cloud console to chart the time series data. In theQuery results pane, click theVisualization tab:

The ARIMA PLUS model detects that the daily total number of Citi Bike trips wherestart_station_id=79 is decreasing. Future forecasting values will follow this trend and give relatively smaller forecasting numbers the farther into the future you go. The chart shows that the forecasted values for the daily total number of CitiBike trips wherestart_station_id=79 are positive numbers, which is more useful. The model with limitsdetects that the daily total number of Citi Bike trips wherestart_station_id=79 is decreasing, but it still gives meaningful forecasting values.
As this tutorial shows, theforecast_limit_lower_bound andforecast_limit_upper_bound options can help you get more meaningful forecasting values in similar scenarios to the one shown here, such as when forecasting stock prices or future sales numbers.
Delete your dataset
Deleting your project removes all datasets and all tables in the project. If youprefer to reuse the project, you can delete the dataset you created in thistutorial:
If necessary, open the BigQuery page in theGoogle Cloud console.
In the navigation, click thebqml_tutorial dataset you created.
ClickDelete dataset on the right side of the window.This action deletes the dataset, the table, and all the data.
In theDelete dataset dialog, confirm the delete command by typingthe name of your dataset (
bqml_tutorial) and then clickDelete.
Delete your project
To delete the project:
What's next
- Learn how toperform multiple time-series forecasting with a single query from NYC Citi Bike trips data.
- Learn how toaccelerate ARIMA_PLUS to enable forecast 1 million time series within hours.
- To learn more about machine learning, see theMachine learning crash course.
- For an overview of BigQuery ML, seeIntroduction to BigQuery ML.
- To learn more about the Google Cloud console, seeUsing the Google Cloud console.
Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2025-12-15 UTC.