Forecast a single time series with an ARIMA_PLUS univariate model

This tutorial teaches you how to use anARIMA_PLUS univariate time series model to forecast the future value for a given column based on the historical valuesfor that column.

This tutorial forecasts a single time series. Forecasted values arecalculated once for each time point in the input data.

This tutorial uses data from the publicbigquery-public-data.google_analytics_sample.ga_sessions sample table. Thistable contains obfuscated ecommerce data from the Google Merchandise Store.

Objectives

This tutorial guides you through completing the following tasks:

Costs

This tutorial uses billable components of Google Cloud, including the following:

  • BigQuery
  • BigQuery ML

For more information about BigQuery costs, see theBigQuery pricing page.

For more information about BigQuery ML costs, seeBigQuery ML pricing.

Before you begin

  1. Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.
  2. In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

    Roles required to select or create a project

    • Select a project: Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
    • Create a project: To create a project, you need the Project Creator role (roles/resourcemanager.projectCreator), which contains theresourcemanager.projects.create permission.Learn how to grant roles.
    Note: If you don't plan to keep the resources that you create in this procedure, create a project instead of selecting an existing project. After you finish these steps, you can delete the project, removing all resources associated with the project.

    Go to project selector

  3. Verify that billing is enabled for your Google Cloud project.

  4. In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

    Roles required to select or create a project

    • Select a project: Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
    • Create a project: To create a project, you need the Project Creator role (roles/resourcemanager.projectCreator), which contains theresourcemanager.projects.create permission.Learn how to grant roles.
    Note: If you don't plan to keep the resources that you create in this procedure, create a project instead of selecting an existing project. After you finish these steps, you can delete the project, removing all resources associated with the project.

    Go to project selector

  5. Verify that billing is enabled for your Google Cloud project.

  6. BigQuery is automatically enabled in new projects. To activate BigQuery in a pre-existing project, go to

    Enable the BigQuery API.

    Roles required to enable APIs

    To enable APIs, you need the Service Usage Admin IAM role (roles/serviceusage.serviceUsageAdmin), which contains theserviceusage.services.enable permission.Learn how to grant roles.

    Enable the API

Required Permissions

  • To create the dataset, you need thebigquery.datasets.createIAM permission.

  • To create the model, you need the following permissions:

    • bigquery.jobs.create
    • bigquery.models.create
    • bigquery.models.getData
    • bigquery.models.updateData
  • To run inference, you need the following permissions:

    • bigquery.models.getData
    • bigquery.jobs.create

For more information about IAM roles and permissions inBigQuery, seeIntroduction to IAM.

Create a dataset

Create a BigQuery dataset to store your ML model.

Console

  1. In the Google Cloud console, go to theBigQuery page.

    Go to the BigQuery page

  2. In theExplorer pane, click your project name.

  3. ClickView actions > Create dataset

  4. On theCreate dataset page, do the following:

    • ForDataset ID, enterbqml_tutorial.

    • ForLocation type, selectMulti-region, and then selectUS (multiple regions in United States).

    • Leave the remaining default settings as they are, and clickCreate dataset.

bq

To create a new dataset, use thebq mk commandwith the--location flag. For a full list of possible parameters, see thebq mk --dataset commandreference.

  1. Create a dataset namedbqml_tutorial with the data location set toUSand a description ofBigQuery ML tutorial dataset:

    bq --location=US mk -d \ --description "BigQuery ML tutorial dataset." \ bqml_tutorial

    Instead of using the--dataset flag, the command uses the-d shortcut.If you omit-d and--dataset, the command defaults to creating adataset.

  2. Confirm that the dataset was created:

    bqls

API

Call thedatasets.insertmethod with a defineddataset resource.

{"datasetReference":{"datasetId":"bqml_tutorial"}}

BigQuery DataFrames

Before trying this sample, follow the BigQuery DataFrames setup instructions in theBigQuery quickstart using BigQuery DataFrames. For more information, see theBigQuery DataFrames reference documentation.

To authenticate to BigQuery, set up Application Default Credentials. For more information, seeSet up ADC for a local development environment.

importgoogle.cloud.bigquerybqclient=google.cloud.bigquery.Client()bqclient.create_dataset("bqml_tutorial",exists_ok=True)

Visualize the input data

Before creating the model, you can optionally visualize your inputtime series data to get a sense of the distribution. You can do this by using Looker Studio.

Follow these steps to visualize the time series data:

SQL

In the following GoogleSQL query, theSELECT statement parses thedate column from the inputtable to theTIMESTAMP type and renames it toparsed_date, and usestheSUM(...) clause and theGROUP BY date clause to create a dailytotals.visits value.

  1. In the Google Cloud console, go to theBigQuery page.

    Go to BigQuery

  2. In the query editor, paste in the following query and clickRun:

    SELECTPARSE_TIMESTAMP("%Y%m%d",date)ASparsed_date,SUM(totals.visits)AStotal_visitsFROM`bigquery-public-data.google_analytics_sample.ga_sessions_*`GROUPBYdate;
    1. When the query completes, clickOpen in>Looker Studio. Looker Studio opens ina new tab. Complete the following steps in the new tab.

    2. In the Looker Studio, clickInsert>Time series chart.

    3. In theChart pane, choose theSetup tab.

    4. In theMetric section, add thetotal_visits field, and remove thedefaultRecord Count metric.The resulting chart looks similar to the following:

      Result_visualization

      Looking at the chart, you can see that the input time series has a weekly seasonal pattern.

      Note: For more information about Looker Studio support, seeLooker Support integrations with Google Cloud.

BigQuery DataFrames

Before trying this sample, follow the BigQuery DataFrames setup instructions in theBigQuery quickstart using BigQuery DataFrames. For more information, see theBigQuery DataFrames reference documentation.

To authenticate to BigQuery, set up Application Default Credentials. For more information, seeSet up ADC for a local development environment.

importbigframes.pandasasbpd# Start by loading the historical data from BigQuerythat you want to analyze and forecast.# This clause indicates that you are querying the ga_sessions_* tables in the google_analytics_sample dataset.# Read and visualize the time series you want to forecast.df=bpd.read_gbq("bigquery-public-data.google_analytics_sample.ga_sessions_*")parsed_date=bpd.to_datetime(df.date,format="%Y%m%d",utc=True)parsed_date.name="parsed_date"visits=df["totals"].struct.field("visits")visits.name="total_visits"total_visits=visits.groupby(parsed_date).sum()# Expected output: total_visits.head()# parsed_date# 2016-08-01 00:00:00+00:00    1711# 2016-08-02 00:00:00+00:00    2140# 2016-08-03 00:00:00+00:00    2890# 2016-08-04 00:00:00+00:00    3161# 2016-08-05 00:00:00+00:00    2702# Name: total_visits, dtype: Int64total_visits.plot.line()

The result is similar to the following:Result_visualization

Create the time series model

Create a time series model to forecast total site visits as represented bytotals.visits column, and train it on the Google Analytics 360data.

SQL

In the following query, theOPTIONS(model_type='ARIMA_PLUS', time_series_timestamp_col='date', ...)clause indicates that you are creating anARIMA-basedtime series model. Theauto_arima optionof theCREATE MODEL statement defaults toTRUE, so theauto.ARIMAalgorithm automatically tunes the hyperparameters in the model. The algorithmfits dozens of candidate models and chooses the best model, which is the modelwith the lowestAkaike information criterion (AIC).Thedata_frequency optionof theCREATE MODEL statements defaults toAUTO_FREQUENCY, so thetraining process automatically infers the data frequency of the input timeseries. Thedecompose_time_series optionof theCREATE MODEL statement defaults toTRUE, so that information aboutthe time series data is returned when you evaluate the model in the next step.

Follow these steps to create the model:

  1. In the Google Cloud console, go to theBigQuery page.

    Go to BigQuery

  2. In the query editor, paste in the following query and clickRun:

    CREATEORREPLACEMODEL`bqml_tutorial.ga_arima_model`OPTIONS(model_type='ARIMA_PLUS',time_series_timestamp_col='parsed_date',time_series_data_col='total_visits',auto_arima=TRUE,data_frequency='AUTO_FREQUENCY',decompose_time_series=TRUE)ASSELECTPARSE_TIMESTAMP("%Y%m%d",date)ASparsed_date,SUM(totals.visits)AStotal_visitsFROM`bigquery-public-data.google_analytics_sample.ga_sessions_*`GROUPBYdate;

    The query takes about 4 seconds to complete, after which you can access thega_arima_model model. Because the query uses aCREATE MODEL statementto create a model, you don't see query results.

Note: You might wonder if United States holidays have an impact on the timeseries. You can try setting theholiday_region option of theCREATE MODEL statement toUS. Setting this option allows a more accuratemodeling on holiday time points if there are any holiday patterns in the timeseries.

BigQuery DataFrames

Before trying this sample, follow the BigQuery DataFrames setup instructions in theBigQuery quickstart using BigQuery DataFrames. For more information, see theBigQuery DataFrames reference documentation.

To authenticate to BigQuery, set up Application Default Credentials. For more information, seeSet up ADC for a local development environment.

frombigframes.mlimportforecastingimportbigframes.pandasasbpd# Create a time series model to forecast total site visits:# The auto_arima option defaults to True, so the auto.ARIMA algorithm automatically# tunes the hyperparameters in the model.# The data_frequency option defaults to 'auto_frequency so the training# process automatically infers the data frequency of the input time series.# The decompose_time_series option defaults to True, so that information about# the time series data is returned when you evaluate the model in the next step.model=forecasting.ARIMAPlus()model.auto_arima=Truemodel.data_frequency="auto_frequency"model.decompose_time_series=True# Use the data loaded in the previous step to fit the modeltraining_data=total_visits.to_frame().reset_index(drop=False)X=training_data[["parsed_date"]]y=training_data[["total_visits"]]model.fit(X,y)

Evaluate the candidate models

SQL

Evaluate the time series models by using theML.ARIMA_EVALUATEfunction. TheML.ARIMA_EVALUATE function shows you the evaluation metrics ofall the candidate models evaluated during the process of automatichyperparameter tuning.

Follow these steps to evaluate the model:

  1. In the Google Cloud console, go to theBigQuery page.

    Go to BigQuery

  2. In the query editor, paste in the following query and clickRun:

    SELECT*FROMML.ARIMA_EVALUATE(MODEL`bqml_tutorial.ga_arima_model`);

    The results should look similar to the following:

    ML.ARIMA_EVALUATE output.

BigQuery DataFrames

Before trying this sample, follow the BigQuery DataFrames setup instructions in theBigQuery quickstart using BigQuery DataFrames. For more information, see theBigQuery DataFrames reference documentation.

To authenticate to BigQuery, set up Application Default Credentials. For more information, seeSet up ADC for a local development environment.

# Evaluate the time series models by using the summary() function. The summary()# function shows you the evaluation metrics of all the candidate models evaluated# during the process of automatic hyperparameter tuning.summary=model.summary(show_all_candidate_models=True,)print(summary.peek())# Expected output:# row   non_seasonal_pnon_seasonal_dnon_seasonal_qhas_driftlog_likelihoodAICvarianceseasonal_periodshas_holiday_effecthas_spikes_and_dipshas_step_changeserror_message#  0      0              1               3      True     -2464.2556564938.511313     42772.506055        ['WEEKLY']            False        False            True#  1      2              1               0      False     -2473.1416514952.283303     44942.416463        ['WEEKLY']            False        False            True#  2      1              1               0       False     -2479.8808854963.76177     46642.953433        ['WEEKLY']            False        False            True#  3      0              1               1      False     -2470.6323774945.264753     44319.379307        ['WEEKLY']            False        False            True#  4      2              1               1      True     -2463.6712474937.342493     42633.299513        ['WEEKLY']            False        False            True

Thenon_seasonal_p,non_seasonal_d,non_seasonal_q, andhas_driftoutput columns define an ARIMA model in the training pipeline. Thelog_likelihood,AIC, andvarianceoutput columns are relevant to the ARIMAmodel fitting process.

Theauto.ARIMA algorithm uses theKPSS test to determine the best value fornon_seasonal_d, which in this case is1. Whennon_seasonal_d is1, theauto.ARIMA algorithm trains 42 different candidate ARIMA models in parallel. In this example, all 42 candidate models are valid, so the output contains 42 rows, one for each candidate ARIMA model; in cases where some of the models aren't valid, they are excluded from the output. These candidate models are returned in ascending order by AIC. The model in the first row has the lowest AIC, and is considered the best model. The best model is saved as the final model and is used when you call functions such asML.FORECAST on the model

Theseasonal_periods column contains information about the seasonal patternidentified in the time series data. It has nothing to do with the ARIMAmodeling, therefore it has the same value across all output rows. It reports aweekly pattern, which agrees with the results you saw if you chose tovisualize the input data.

Thehas_holiday_effect,has_spikes_and_dips, andhas_step_changes columnsare only populated whendecompose_time_series=TRUE. These columns also reflectinformation about the input time series data, and are not related to the ARIMAmodeling. These columns also have the same values across all output rows.

Theerror_message column shows any errors that incurred during theauto.ARIMA fitting process. One possible reason for errors is when the selectednon_seasonal_p,non_seasonal_d,non_seasonal_q, andhas_drift columnsare not able to stabilize the time series. To retrieve the errormessage of all the candidate models, set theshow_all_candidate_modelsoption toTRUE when you create the model.

For more information about the output columns, seeML.ARIMA_EVALUATE function.

Inspect the model's coefficients

SQL

Inspect the time series model's coefficients by using theML.ARIMA_COEFFICIENTS function.

Follow these steps to retrieve the model's coefficients:

  1. In the Google Cloud console, go to theBigQuery page.

    Go to BigQuery

  2. In the query editor, paste in the following query and clickRun:

    SELECT*FROMML.ARIMA_COEFFICIENTS(MODEL`bqml_tutorial.ga_arima_model`);

Thear_coefficients output column shows the model coefficients of theautoregressive (AR) part of the ARIMA model. Similarly, thema_coefficientsoutput column shows the model coefficients of the moving-average (MA) part ofthe ARIMA model. Both of these columns contain array values, whose lengths areequal tonon_seasonal_p andnon_seasonal_q, respectively. You saw in theoutput of theML.ARIMA_EVALUATE function that the best model has anon_seasonal_p value of2 and anon_seasonal_q value of3. Therefore, intheML.ARIMA_COEFFICIENTS output, thear_coefficients value is a 2-elementarray and thema_coefficients value is a 3-element array. Theintercept_or_drift value is the constant term in the ARIMA model.

For more information about the output columns, seeML.ARIMA_COEFFICIENTS function.

BigQuery DataFrames

Inspect the time series model's coefficients by using thecoef_ function.

Before trying this sample, follow the BigQuery DataFrames setup instructions in theBigQuery quickstart using BigQuery DataFrames. For more information, see theBigQuery DataFrames reference documentation.

To authenticate to BigQuery, set up Application Default Credentials. For more information, seeSet up ADC for a local development environment.

coef=model.coef_print(coef.peek())# Expected output:#       ar_coefficients   ma_coefficients   intercept_or_drift#   0 [0.40944762]   [-0.81168198]      0.0

Thear_coefficients output column shows the model coefficients of theautoregressive (AR) part of the ARIMA model. Similarly, thema_coefficientsoutput column shows the model coefficients of the moving-average (MA) part ofthe ARIMA model. Both of these columns contain array values, whose lengths areequal tonon_seasonal_p andnon_seasonal_q, respectively.

Use the model to forecast data

SQL

Forecast future time series values by using theML.FORECASTfunction.

In the following GoogleSQL query, theSTRUCT(30 AS horizon, 0.8 AS confidence_level) clause indicates that thequery forecasts 30 future time points, and generates a prediction intervalwith a 80% confidence level.

Follow these steps to forecast data with the model:

  1. In the Google Cloud console, go to theBigQuery page.

    Go to BigQuery

  2. In the query editor, paste in the following query and clickRun:

    SELECT*FROMML.FORECAST(MODEL`bqml_tutorial.ga_arima_model`,STRUCT(30AShorizon,0.8ASconfidence_level));

    The results should look similar to the following:

    ML.FORECAST output.

BigQuery DataFrames

Forecast future time series values by using thepredict function.

Before trying this sample, follow the BigQuery DataFrames setup instructions in theBigQuery quickstart using BigQuery DataFrames. For more information, see theBigQuery DataFrames reference documentation.

To authenticate to BigQuery, set up Application Default Credentials. For more information, seeSet up ADC for a local development environment.

prediction=model.predict(horizon=30,confidence_level=0.8)print(prediction.peek())# Expected output:#           forecast_timestamp   forecast_valuestandard_errorconfidence_levelprediction_interval_lower_bound    prediction_interval_upper_bound    confidence_interval_lower_bound    confidence_interval_upper_bound# 112017-08-13 00:00:00+00:001845.439732      328.060405      0.8                 1424.772257                      2266.107208                     1424.772257                     2266.107208# 292017-08-31 00:00:00+00:002615.993932      431.286628      0.8                 2062.960849                      3169.027015                     2062.960849                     3169.027015# 7    2017-08-09 00:00:00+00:002639.285993      300.301186      0.8                 2254.213792                      3024.358193                     2254.213792                     3024.358193# 252017-08-27 00:00:00+00:001853.735689      410.596551      0.8                 1327.233216                      2380.238162                     1327.233216                     2380.238162# 1    2017-08-03 00:00:00+00:002621.33159      241.093355      0.8                 2312.180802                      2930.482379                     2312.180802                     2930.482379

The output rows are in chronological order by theforecast_timestamp column value. In time series forecasting, the predictioninterval, as represented by theprediction_interval_lower_bound andprediction_interval_upper_bound column values, is as important as theforecast_value column value. Theforecast_value value is the middle pointof the prediction interval. The prediction interval depends on thestandard_error andconfidence_level column values.

For more information about the output columns, seeML.FORECAST function.

Explain the forecasting results

SQL

You can get explainability metrics in addition to forecast data by using theML.EXPLAIN_FORECAST function. TheML.EXPLAIN_FORECAST function forecastsfuture time series values and also returns all the separate components of thetime series.

Similar to theML.FORECAST function, theSTRUCT(30 AS horizon, 0.8 AS confidence_level) clause used in theML.EXPLAIN_FORECAST function indicates that the query forecasts 30 futuretime points and generates a prediction interval with 80% confidence.

Follow these steps to explain the model's results:

  1. In the Google Cloud console, go to theBigQuery page.

    Go to BigQuery

  2. In the query editor, paste in the following query and clickRun:

    SELECT*FROMML.EXPLAIN_FORECAST(MODEL`bqml_tutorial.ga_arima_model`,STRUCT(30AShorizon,0.8ASconfidence_level));

    The results should look similar to the following:

    The first nine output columns of forecasted data and forecast explanations.The tenth through seventeenth output columns of forecasted data and forecast explanations.The last six output columns of forecasted data and forecast explanations.

    The output rows are ordered chronologically by thetime_series_timestampcolumn value.

    For more information about the output columns, seeML.EXPLAIN_FORECAST function.

BigQuery DataFrames

You can get explainability metrics in addition to forecast data by using thepredict_explain function. Thepredict_explain function forecasts future time series values and also returns all the separate components of the time series.

Similar to thepredict function, thehorizon=30, confidence_level=0.8 clause used in thepredict_explain function indicates that the query forecasts 30 future time points and generates a prediction interval with 80% confidence.

Before trying this sample, follow the BigQuery DataFrames setup instructions in theBigQuery quickstart using BigQuery DataFrames. For more information, see theBigQuery DataFrames reference documentation.

To authenticate to BigQuery, set up Application Default Credentials. For more information, seeSet up ADC for a local development environment.

ex_pred=model.predict_explain(horizon=30,confidence_level=0.8)print(ex_pred.head(4))# Expected output:#       time_series_timestamp  time_series_type    time_series_datatime_series_adjusted_data standard_error   confidence_level   prediction_interval_lower_bound   prediction_interval_upper_bound  trend   seasonal_period_yearly  seasonal_period_quarterly    seasonal_period_monthly   seasonal_period_weekly  seasonal_period_daily    holiday_effect   spikes_and_dips   step_changes   residual# 0  2016-08-01 00:00:00+00:00      history             1711.0               505.716474           206.939556         <NA>                    <NA>                            <NA>               0.0           <NA>                        <NA>                     <NA>                 169.611938                  <NA>                <NA>            <NA>       1205.283526   336.104536# 1  2016-08-02 00:00:00+00:00      history             2140.0               623.137701           206.939556         <NA>                    <NA>                            <NA>            336.104428       <NA>                        <NA>                     <NA>                 287.033273                  <NA>                <NA>            <NA>       1205.283526   311.578773# 2  2016-08-03 00:00:00+00:00      history             2890.0               1008.655091           206.939556         <NA>                    <NA>                            <NA>            563.514213       <NA>                        <NA>                     <NA>                 445.140878                  <NA>                <NA>            <NA>       1205.283526   676.061383# 3  2016-08-04 00:00:00+00:00      history             3161.0               1389.40959           206.939556         <NA>                    <NA>                            <NA>            986.317236       <NA>                        <NA>                     <NA>                 403.092354                  <NA>                <NA>            <NA>       1205.283526   566.306884# 4  2016-08-05 00:00:00+00:00      history             2702.0               1394.395741           206.939556         <NA>                    <NA>                            <NA>            1248.707386       <NA>                        <NA>                     <NA>                 145.688355                  <NA>                <NA>            <NA>       1205.283526   102.320733# 5  2016-08-06 00:00:00+00:00      history             1663.0               437.09243           206.939556         <NA>                    <NA>                            <NA>            1188.59004       <NA>                        <NA>                     <NA>                 -751.49761                  <NA>                <NA>            <NA>       1205.283526    20.624044

If you would like to visualize the results, you can useLooker Studio as described in theVisualize the input datasection to create a chart, using the following columns as metrics:

  • time_series_data
  • prediction_interval_lower_bound
  • prediction_interval_upper_bound
  • trend
  • seasonal_period_weekly
  • step_changes

Clean up

To avoid incurring charges to your Google Cloud account for the resources used in this tutorial, either delete the project that contains the resources, or keep the project and delete the individual resources.

  • You can delete the project you created.
  • Or you can keep the project and delete the dataset.

Delete your dataset

Deleting your project removes all datasets and all tables in the project. If youprefer to reuse the project, you can delete the dataset you created in thistutorial:

  1. If necessary, open the BigQuery page in theGoogle Cloud console.

    Go to the BigQuery page

  2. In the navigation, click thebqml_tutorial dataset you created.

  3. ClickDelete dataset on the right side of the window.This action deletes the dataset, the table, and all the data.

  4. In theDelete dataset dialog box, confirm the delete command by typingthe name of your dataset (bqml_tutorial) and then clickDelete.

Delete your project

To delete the project:

    Caution: Deleting a project has the following effects:
    • Everything in the project is deleted. If you used an existing project for the tasks in this document, when you delete it, you also delete any other work you've done in the project.
    • Custom project IDs are lost. When you created this project, you might have created a custom project ID that you want to use in the future. To preserve the URLs that use the project ID, such as anappspot.com URL, delete selected resources inside the project instead of deleting the whole project.

    If you plan to explore multiple architectures, tutorials, or quickstarts, reusing projects can help you avoid exceeding project quota limits.

  1. In the Google Cloud console, go to theManage resources page.

    Go to Manage resources

  2. In the project list, select the project that you want to delete, and then clickDelete.
  3. In the dialog, type the project ID, and then clickShut down to delete the project.

What's next

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2025-12-15 UTC.