Forecast multiple time series with a multivariate model

This tutorial teaches you how to use amultivariate time series model to forecast the future valuefor a given column, based on the historical value of multiple input features.

This tutorial forecasts for multiple time series. Forecasted values arecalculated for each time point, for each value in one or more specified columns.For example, if you wanted to forecast weather and specified a column containingstate data, the forecasted data would contain forecasts for all time points forState A, then forecasted values for all time points for State B, and so forth.If you wanted to forecast weather and specified columns containingstate and city data, the forecasted data would contain forecasts for all timepoints for State A and City A, then forecasted values for all time points forState A and City B, and so forth.

This tutorial uses data from the publicbigquery-public-data.iowa_liquor_sales.salesandbigquery-public-data.covid19_weathersource_com.postal_code_day_historytables. Thebigquery-public-data.iowa_liquor_sales.sales table containsliquor sales data collected from multiple cities in the state of Iowa. Thebigquery-public-data.covid19_weathersource_com.postal_code_day_history tablecontains historical weather data, such as temperature and humidity, fromaround the world.

Before reading this tutorial, we highly recommend that you readForecast a single time series with a multivariate model.

Objectives

This tutorial guides you through completing the following tasks:

Costs

This tutorial uses billable components of Google Cloud, including the following:

  • BigQuery
  • BigQuery ML

For more information about BigQuery costs, see theBigQuery pricing page.

For more information about BigQuery ML costs, seeBigQuery ML pricing.

Before you begin

  1. Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.
  2. In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

    Roles required to select or create a project

    • Select a project: Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
    • Create a project: To create a project, you need the Project Creator role (roles/resourcemanager.projectCreator), which contains theresourcemanager.projects.create permission.Learn how to grant roles.
    Note: If you don't plan to keep the resources that you create in this procedure, create a project instead of selecting an existing project. After you finish these steps, you can delete the project, removing all resources associated with the project.

    Go to project selector

  3. Verify that billing is enabled for your Google Cloud project.

  4. In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

    Roles required to select or create a project

    • Select a project: Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
    • Create a project: To create a project, you need the Project Creator role (roles/resourcemanager.projectCreator), which contains theresourcemanager.projects.create permission.Learn how to grant roles.
    Note: If you don't plan to keep the resources that you create in this procedure, create a project instead of selecting an existing project. After you finish these steps, you can delete the project, removing all resources associated with the project.

    Go to project selector

  5. Verify that billing is enabled for your Google Cloud project.

  6. BigQuery is automatically enabled in new projects. To activate BigQuery in a pre-existing project, go to

    Enable the BigQuery API.

    Roles required to enable APIs

    To enable APIs, you need the Service Usage Admin IAM role (roles/serviceusage.serviceUsageAdmin), which contains theserviceusage.services.enable permission.Learn how to grant roles.

    Enable the API

Create a dataset

Create a BigQuery dataset to store your ML model.

Console

  1. In the Google Cloud console, go to theBigQuery page.

    Go to the BigQuery page

  2. In theExplorer pane, click your project name.

  3. ClickView actions > Create dataset

  4. On theCreate dataset page, do the following:

    • ForDataset ID, enterbqml_tutorial.

    • ForLocation type, selectMulti-region, and then selectUS (multiple regions in United States).

    • Leave the remaining default settings as they are, and clickCreate dataset.

bq

To create a new dataset, use thebq mk commandwith the--location flag. For a full list of possible parameters, see thebq mk --dataset commandreference.

  1. Create a dataset namedbqml_tutorial with the data location set toUSand a description ofBigQuery ML tutorial dataset:

    bq --location=US mk -d \ --description "BigQuery ML tutorial dataset." \ bqml_tutorial

    Instead of using the--dataset flag, the command uses the-d shortcut.If you omit-d and--dataset, the command defaults to creating adataset.

  2. Confirm that the dataset was created:

    bqls

API

Call thedatasets.insertmethod with a defineddataset resource.

{"datasetReference":{"datasetId":"bqml_tutorial"}}

BigQuery DataFrames

Before trying this sample, follow the BigQuery DataFrames setup instructions in theBigQuery quickstart using BigQuery DataFrames. For more information, see theBigQuery DataFrames reference documentation.

To authenticate to BigQuery, set up Application Default Credentials. For more information, seeSet up ADC for a local development environment.

importgoogle.cloud.bigquerybqclient=google.cloud.bigquery.Client()bqclient.create_dataset("bqml_tutorial",exists_ok=True)

Create a table of input data

Create a table of data that you can use to train and evaluate the model. Thistable combines columns from thebigquery-public-data.iowa_liquor_sales.sales andbigquery-public-data.covid19_weathersource_com.postal_code_day_history tablesto analyze how weather affects the type and number of items ordered by liquorstores. You also create the following additional columns that you can use asinput variables for the model:

  • date: the date of the order
  • store_number: the unique number of the store that placed the order
  • item_number: the unique number of the item that was ordered
  • bottles_sold: the number of bottles ordered of the associated item
  • temperature: the average temperature at the store location on the order date
  • humidity: the average humidity at the store location on the order date

Follow these steps to create the input data table:

  1. In the Google Cloud console, go to theBigQuery page.

    Go to BigQuery

  2. In the query editor, paste in the following query and clickRun:

    CREATEORREPLACETABLE`bqml_tutorial.iowa_liquor_sales_with_weather`ASWITHsalesAS(SELECTDATE,store_number,item_number,bottles_sold,SAFE_CAST(SAFE_CAST(zip_codeASFLOAT64)ASINT64)ASzip_codeFROM`bigquery-public-data.iowa_liquor_sales.sales`ASsalesWHERESAFE_CAST(zip_codeASFLOAT64)ISNOTNULL),aggregated_salesAS(SELECTDATE,store_number,item_number,ANY_VALUE(zip_code)ASzip_code,SUM(bottles_sold)ASbottles_sold,FROMsalesGROUPBYDATE,store_number,item_number),weatherAS(SELECTDATE,SAFE_CAST(postal_codeASINT64)ASzip_code,avg_temperature_air_2m_fAStemperature,avg_humidity_specific_2m_gpkgAShumidity,FROM`bigquery-public-data.covid19_weathersource_com.postal_code_day_history`WHEREcountry='US'ANDSAFE_CAST(postal_codeASINT64)ISNOTNULL)SELECTaggregated_sales.date,aggregated_sales.store_number,aggregated_sales.item_number,aggregated_sales.bottles_sold,weather.temperatureAStemperature,weather.humidityAShumidityFROMaggregated_salesLEFTJOINweatherONaggregated_sales.zip_code=weather.zip_codeANDaggregated_sales.DATE=weather.DATE;

Create the time series model

Create a time series model to forecast bottles sold for each combinationof store ID and item ID, for each date in thebqml_tutorial.iowa_liquor_sales_with_weather table prior toSeptember 1, 2022. Use the store location's average temperature and humidityon each date as features to evaluate during forecasting. There are about 1million distinct combinations of item number and store number in thebqml_tutorial.iowa_liquor_sales_with_weather table, which means there are1 million different time series to forecast.

Follow these steps to create the model:

  1. In the Google Cloud console, go to theBigQuery page.

    Go to BigQuery

  2. In the query editor, paste in the following query and clickRun:

    CREATEORREPLACEMODEL`bqml_tutorial.multi_time_series_arimax_model`OPTIONS(model_type='ARIMA_PLUS_XREG',time_series_id_col=['store_number','item_number'],time_series_data_col='bottles_sold',time_series_timestamp_col='date')ASSELECT*FROM`bqml_tutorial.iowa_liquor_sales_with_weather`WHEREDATE<DATE('2022-09-01');

    The query takes about approximately 38 minutes to complete, after whichyou can access themulti_time_series_arimax_model model. Because thequery uses aCREATE MODEL statement to create a model, you don't seequery results.

Use the model to forecast data

Forecast future time series values by using theML.FORECASTfunction.

In the following GoogleSQL query, theSTRUCT(5 AS horizon, 0.8 AS confidence_level) clause indicates that thequery forecasts 5 future time points, and generates a prediction intervalwith a 80% confidence level.

The data signature of the input data for theML.FORECAST function isthe same as the data signature for the training data that you used to createthe model. Thebottles_sold column isn't included in the input, because thatis the data the model is trying to forecast.

Follow these steps to forecast data with the model:

  1. In the Google Cloud console, go to theBigQuery page.

    Go to BigQuery

  2. In the query editor, paste in the following query and clickRun:

    SELECT*FROMML.FORECAST(model`bqml_tutorial.multi_time_series_arimax_model`,STRUCT(5AShorizon,0.8ASconfidence_level),(SELECT*EXCEPT(bottles_sold)FROM`bqml_tutorial.iowa_liquor_sales_with_weather`WHEREDATE>=DATE('2022-09-01')));

    The results should look similar to the following:

    Forecasted data for the number of bottles sold.

    The output rows are in order by thestore_number value, then by theitem_ID value, then in chronological order by theforecast_timestamp column value. In time series forecasting, the predictioninterval, as represented by theprediction_interval_lower_bound andprediction_interval_upper_bound column values, is as important as theforecast_value column value. Theforecast_value value is the middle pointof the prediction interval. The prediction interval depends on thestandard_error andconfidence_level column values.

    For more information about the output columns, seeML.FORECAST.

Explain the forecasting results

You can get explainability metrics in addition to forecast data by using theML.EXPLAIN_FORECAST function. TheML.EXPLAIN_FORECAST function forecastsfuture time series values and also returns all the separate components of thetime series.

Similar to theML.FORECAST function, theSTRUCT(5 AS horizon, 0.8 AS confidence_level) clause used in theML.EXPLAIN_FORECAST function indicates that the query forecasts 30 futuretime points and generates a prediction interval with 80% confidence.

TheML.EXPLAIN_FORECAST function provides both historical data andforecast data. To see only the forecast data, add thetime_series_type optionto the query and specifyforecast as the option value.

Follow these steps to explain the model's results:

  1. In the Google Cloud console, go to theBigQuery page.

    Go to BigQuery

  2. In the query editor, paste in the following query and clickRun:

    SELECT*FROMML.EXPLAIN_FORECAST(model`bqml_tutorial.multi_time_series_arimax_model`,STRUCT(5AShorizon,0.8ASconfidence_level),(SELECT*EXCEPT(bottles_sold)FROM`bqml_tutorial.iowa_liquor_sales_with_weather`WHEREDATE>=DATE('2022-09-01')));

    The results should look similar to the following:

    The first nine output columns of forecasted data and forecast explanations.The tenth through seventeenth output columns of forecasted data and forecast explanations.The last six output columns of forecasted data and forecast explanations.

    The output rows are ordered chronologically by thetime_series_timestampcolumn value.

    For more information about the output columns, seeML.EXPLAIN_FORECAST.

Evaluate forecasting accuracy

Evaluate the forecasting accuracy of the model by running it on data that themodel hasn't been trained on. You can do this by using theML.EVALUATEfunction. TheML.EVALUATE function evaluates each time series independently.

In the following GoogleSQL query, the secondSELECT statementprovides the data with the future features, which are usedto forecast the future values to compare to the actual data.

Follow these steps to evaluate the model's accuracy:

  1. In the Google Cloud console, go to theBigQuery page.

    Go to BigQuery

  2. In the query editor, paste in the following query and clickRun:

    SELECT*FROMML.EVALUATE(model`bqml_tutorial.multi_time_series_arimax_model`,(SELECT*FROM`bqml_tutorial.iowa_liquor_sales_with_weather`WHEREDATE>=DATE('2022-09-01')));

    The results should look similar the following:

    Evaluation metrics for the model.

    For more information about the output columns, seeML.EVALUATE.

Use the model to detect anomalies

Detect anomalies in the training data by using theML.DETECT_ANOMALIESfunction.

In the following query, theSTRUCT(0.95 AS anomaly_prob_threshold) clausecauses theML.DETECT_ANOMALIES function to identify anomalous data pointswith a 95% confidence level.

Follow these steps to detect anomalies in the training data:

  1. In the Google Cloud console, go to theBigQuery page.

    Go to BigQuery

  2. In the query editor, paste in the following query and clickRun:

    SELECT*FROMML.DETECT_ANOMALIES(model`bqml_tutorial.multi_time_series_arimax_model`,STRUCT(0.95ASanomaly_prob_threshold));

    The results should look similar the following:

    Anomaly detection information for the training data.

    Theanomaly_probability column in the results identifies the likelihoodthat a givenbottles_sold column value is anomalous.

    For more information about the output columns, seeML.DETECT_ANOMALIES.

Detect anomalies in new data

Detect anomalies in the new data by providing input data to theML.DETECT_ANOMALIES function. The new data must have the same datasignature as the training data.

Follow these steps to detect anomalies in new data:

  1. In the Google Cloud console, go to theBigQuery page.

    Go to BigQuery

  2. In the query editor, paste in the following query and clickRun:

    SELECT*FROMML.DETECT_ANOMALIES(model`bqml_tutorial.multi_time_series_arimax_model`,STRUCT(0.95ASanomaly_prob_threshold),(SELECT*FROM`bqml_tutorial.iowa_liquor_sales_with_weather`WHEREDATE>=DATE('2022-09-01')));

    The results should look similar the following:

    Anomaly detection information for new data.

Clean up

To avoid incurring charges to your Google Cloud account for the resources used in this tutorial, either delete the project that contains the resources, or keep the project and delete the individual resources.

  • You can delete the project you created.
  • Or you can keep the project and delete the dataset.

Delete your dataset

Deleting your project removes all datasets and all tables in the project. If youprefer to reuse the project, you can delete the dataset you created in thistutorial:

  1. If necessary, open the BigQuery page in theGoogle Cloud console.

    Go to the BigQuery page

  2. In the navigation, click thebqml_tutorial dataset you created.

  3. ClickDelete dataset on the right side of the window.This action deletes the dataset, the table, and all the data.

  4. In theDelete dataset dialog, confirm the delete command by typingthe name of your dataset (bqml_tutorial) and then clickDelete.

Delete your project

To delete the project:

    Caution: Deleting a project has the following effects:
    • Everything in the project is deleted. If you used an existing project for the tasks in this document, when you delete it, you also delete any other work you've done in the project.
    • Custom project IDs are lost. When you created this project, you might have created a custom project ID that you want to use in the future. To preserve the URLs that use the project ID, such as anappspot.com URL, delete selected resources inside the project instead of deleting the whole project.

    If you plan to explore multiple architectures, tutorials, or quickstarts, reusing projects can help you avoid exceeding project quota limits.

  1. In the Google Cloud console, go to theManage resources page.

    Go to Manage resources

  2. In the project list, select the project that you want to delete, and then clickDelete.
  3. In the dialog, type the project ID, and then clickShut down to delete the project.

What's next

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2025-12-15 UTC.