The CREATE MODEL statement for importing XGBoost models

This document describes theCREATE MODEL statement for importingXGBoost models intoBigQuery by using SQL. Alternatively, you can use theGoogle Cloud console user interface tocreate a model by using a UI(Preview) instead of constructing the SQLstatement yourself.

For more information about supported SQL statements and functions for thismodel, seeEnd-to-end user journeys for imported models.

CREATE MODEL syntax

{CREATE MODEL |CREATE MODEL IF NOT EXISTS |CREATE OR REPLACE MODEL}model_name[INPUT(field_namefield_type, …) OUTPUT(field_namefield_type, …)]OPTIONS(MODEL_TYPE = 'XGBOOST', MODEL_PATH =string_value  [, KMS_KEY_NAME =string_value ]);

CREATE MODEL

Creates and trains a new model in the specified dataset. If the model nameexists,CREATE MODEL returns an error.

CREATE MODEL IF NOT EXISTS

Creates and trains a new model only if the model doesn't exist in thespecified dataset.

CREATE OR REPLACE MODEL

Creates and trains a model and replaces an existing model with the same name inthe specified dataset.

model_name

The name of the model you're creating or replacing. The modelname must be unique in the dataset: no other model or table can have the samename. The model name must follow the same naming rules as aBigQuery table. A model name can:

  • Contain up to 1,024 characters
  • Contain letters (upper or lower case), numbers, and underscores

model_name is case-sensitive.

If you don't have a default project configured, then you must prepend theproject ID to the model name in the following format, including backticks:

`[PROJECT_ID].[DATASET].[MODEL]`

For example, `myproject.mydataset.mymodel`.

INPUT OUTPUT clause

TheINPUT OUTPUT clause lets you specify the model input output schemainformation when you create the XGBoost model.

INPUT OUTPUT is optional only iffeature_names andfeature_typesare both specified in the model file. For more information about how to storefeature_names andfeature_types in the XGBoost model file,seeIntroduction to Model IO.

field_name

Usefield_name to define the name of an input feature or a model output.

If thefeature_names field is populated in the XGBoost model file, theinput field names must be identical to the names in thefeature_names field.For more information, see the XGBoost modelJSON Schema.

field_type

Usefield_type to specify the data type of an input feature or a modeloutput. Input data types must be one of the supportednumeric types.The output data type must beFLOAT64.

Example

INPUT(f1INT64,f2FLOAT64,f3FLOAT64)OUTPUT(predicted_labelFLOAT64)

MODEL_TYPE

Syntax

MODEL_TYPE='XGBOOST'

Description

Specifies the model type. This option is required.

MODEL_PATH

Syntax

MODEL_PATH=string_value

Description

Specifies theCloud Storage URIof the XGBoost model to import. This option is required.

Arguments

ASTRING value specifying the URI of a Cloud Storage bucket that containsthe model to import.

BigQuery ML imports the model from Cloud Storage by using thecredentials of the user who runs theCREATE MODEL statement.

Example

MODEL_PATH='gs://bucket/path/to/xgboost_model/*'

KMS_KEY_NAME

Syntax

KMS_KEY_NAME =string_value

Description

The Cloud Key Management Servicecustomer-managed encryption key (CMEK) touse to encrypt the model.

Arguments

ASTRING value containing the fully-qualified name of the CMEK. For example,

'projects/my_project/locations/my_location/keyRings/my_ring/cryptoKeys/my_key'

Locations

For information about supported locations, seeLocations for non-remote models.

Limitations

Imported XGBoost models have the following limitations:

  • The XGBoost model must already exist before it can be imported intoBigQuery.
  • Models must be stored in Cloud Storage.
  • XGBoost models must be inBST orJSON format.
  • You can only use XGBoost models with theML.PREDICTandML.FEATURE_IMPORTANCEfunctions.
  • Models are limited to 250 MB in size.
  • The memory limit to load and run the XGBoost model is 840 MB. You can reduce the modelsize by using fewer trees or shallower tree depth, or by using the XGBoostlibrary's defaultsave_model method to save the models.
  • BigQuery ML uses theXGBoost 1.5.1library to load and make predictions on XGBoost models. Forward compatibilityfor models saved with XGBoost version 1.6.0 or newer is not guaranteed.
  • BigQuery XGBoost models only supportnumeric typesas input data types andFLOAT64as the output data type.
  • Categorical features that use XGBoost built-incategorical datasupport are treated as integer inputs.
  • BigQuery XGBoost models only support a single scalar or arrayoutput.Multiple outputsaren't supported.
  • You can only use an imported XGBoost model with an objecttable when you use capacity-based pricing through reservations. On-demandpricing isn't supported.

Examples

The following examples show how to create different types of importedXGBoost models.

Import a model and specify input and output columns

The following example imports a XGBoost model into BigQuery as aBigQuery model. The example assumes the following:

  • There is an existing XGBoost model located atgs://bucket-name/xgboost-model/*.
  • The model file is inBST format or inJSON` format.
  • The model file doesn't contain information about inputfeature_names andfeature_types.
CREATEORREPLACEMODEL`project_id.mydataset.mymodel`INPUT(f1float64,f2float64,f3float64,f4float64)OUTPUT(predicted_labelfloat64)OPTIONS(MODEL_TYPE='XGBOOST',MODEL_PATH='gs://bucket-name/xgboost-model/*')

Import a model that already contains input and output columns

The following example imports a XGBoost model into BigQuery as aBigQuery model. The example assumes the following:

  • There is an existing XGBoost model located atgs://bucket-name/xgboost-model/*.
  • The model file is inBST format or inJSON format.
  • The model file contains information about inputfeature_names andfeature_types.
CREATEORREPLACEMODEL`project_id.mydataset.mymodel`OPTIONS(MODEL_TYPE='XGBOOST',MODEL_PATH='gs://bucket-name/xgboost-model/*')

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2025-12-15 UTC.