Hyperparameter tuning overview

In machine learning, hyperparameter tuning identifies a set of optimalhyperparameters for a learning algorithm. A hyperparameter is a model argumentwhose value is set before the learning process begins. By contrast, the valuesof other parameters such as coefficients of a linear model are learned.

Hyperparameter tuning lets you spend less time manually iteratinghyperparameters and more time focusing on exploring insights from data.

You can specify hyperparameter tuning options for the following model types:

For these types of models, hyperparameter tuning is enabled when youspecify a value for theNUM_TRIALS optionin theCREATE MODEL statement.

To try running hyperparameter tuning on a linear regression model, seeUse the BigQuery ML hyperparameter tuning to improve model performance.

The following models also support hyperparameter tuning but don't allow you to specifyparticular values:

Locations

For information about which locations support hyperparameter tuning, seeBigQuery ML locations.

Set hyperparameters

To tune a hyperparameter, you must specify a range of values for thathyperparameter that the model can use for a set of trials. You can do this byusing one of the following keywords when setting the hyperparameter in theCREATE MODEL statement, instead of providing a single value:

  • HPARAM_RANGE: A two-elementARRAY(FLOAT64) value that defines the minimumand maximum bounds of the search space of continuous values for ahyperparameter. Use this option to specify a range of values for ahyperparameter, for exampleLEARN_RATE = HPARAM_RANGE(0.0001, 1.0).

  • HPARAM_CANDIDATES: AARRAY(STRUCT) value that specifies the set ofdiscrete values for the hyperparameter. Use this option to specify a setof values for a hyperparameter, for exampleOPTIMIZER = HPARAM_CANDIDATES(['ADAGRAD', 'SGD', 'FTRL']).

Hyperparameters and objectives

The following table lists the supported hyperparameters and objectives foreach model type that supports hyperparameter tuning:

Model typeHyperparameter objectivesHyperparameterValid rangeDefault rangeScale type
LINEAR_REGMEAN_ABSOLUTE_ERROR

MEAN_SQUARED_ERROR

MEAN_SQUARED_LOG_ERROR

MEDIAN_ABSOLUTE_ERROR

R2_SCORE (default)

EXPLAINED_VARIANCE
L1_REG

L2_REG
(0, ∞]

(0, ∞]
(0, 10]

(0, 10]
LOG

LOG
LOGISTIC_REGPRECISION

RECALL

ACCURACY

F1_SCORE

LOG_LOSS

ROC_AUC (default)
L1_REG

L2_REG
(0, ∞]

(0, ∞]
(0, 10]

(0, 10]
LOG

LOG
KMEANSDAVIES_BOULDIN_INDEXNUM_CLUSTERS[2, 100][2, 10]LINEAR
MATRIX_
FACTORIZATION
(explicit)
MEAN_SQUARED_ERRORNUM_FACTORS

L2_REG
[2, 200]

(0, ∞)
[2, 20]

(0, 10]
LINEAR

LOG
MATRIX_
FACTORIZATION
(implicit)
MEAN_AVERAGE_PRECISION (default)

MEAN_SQUARED_ERROR

NORMALIZED_DISCOUNTED_CUMULATIVE_GAIN

AVERAGE_RANK
NUM_FACTORS

L2_REG

WALS_ALPHA
[2, 200]

(0, ∞)

[0, ∞)
[2, 20]

(0, 10]

[0, 100]
LINEAR

LOG

LINEAR
AUTOENCODERMEAN_ABSOLUTE_ERROR

MEAN_SQUARED_ERROR (default)

MEAN_SQUARED_LOG_ERROR
LEARN_RATE

BATCH_SIZE

L1_REG

L2_REG

L1_REG_ACTIVATION

DROPOUT

HIDDEN_UNITS


OPTIMIZER



ACTIVATION_FN
[0, 1]

(0, ∞)

(0, ∞)

(0, ∞)

(0, ∞)


[0, 1)

Array of[1, ∞)

{ADAM,ADAGRAD,FTRL,RMSPROP,SGD}

{RELU,RELU6,CRELU,ELU,SELU,SIGMOID,TANH}
[0, 1]

[16, 1024]

(0, 10]

(0, 10]

(0, 10]


[0, 0.8]

N/A

{ADAM,ADAGRAD,FTRL,RMSPROP,SGD}

N/A
LOG

LOG

LOG

LOG

LOG


LINEAR

N/A

N/A



N/A
DNN_CLASSIFIERPRECISION

RECALL

ACCURACY

F1_SCORE

LOG_LOSS

ROC_AUC (default)
BATCH_SIZE

DROPOUT

HIDDEN_UNITS

LEARN_RATE

OPTIMIZER



L1_REG

L2_REG

ACTIVATION_FN
(0, ∞)

[0, 1)

Array of[1, ∞)

[0, 1]

{ADAM,ADAGRAD,FTRL,RMSPROP,SGD}

(0, ∞)

(0, ∞)

{RELU,RELU6,CRELU,ELU,SELU,SIGMOID,TANH}
[16, 1024]

[0, 0.8]

N/A

[0, 1]

{ADAM,ADAGRAD,FTRL,RMSPROP,SGD}

(0, 10]

(0, 10]

N/A
LOG

LINEAR

N/A

LINEAR

N/A



LOG

LOG

N/A
DNN_REGRESSORMEAN_ABSOLUTE_ERROR

MEAN_SQUARED_ERROR

MEAN_SQUARED_LOG_ERROR

MEDIAN_ABSOLUTE_ERROR

R2_SCORE (default)

EXPLAINED_VARIANCE
DNN_LINEAR_
COMBINED_
CLASSIFIER
PRECISION

RECALL

ACCURACY

F1_SCORE

LOG_LOSS

ROC_AUC (default)
BATCH_SIZE

DROPOUT

HIDDEN_UNITS

L1_REG

L2_REG

ACTIVATION_FN
(0, ∞)

[0, 1)

Array of[1, ∞)

(0, ∞)

(0, ∞)

{RELU,RELU6,CRELU,ELU,SELU,SIGMOID,TANH}
[16, 1024]

[0, 0.8]

N/A

(0, 10]

(0, 10]

N/A
LOG

LINEAR

N/A

LOG

LOG

N/A
DNN_LINEAR_
COMBINED_
REGRESSOR
MEAN_ABSOLUTE_ERROR

MEAN_SQUARED_ERROR

MEAN_SQUARED_LOG_ERROR

MEDIAN_ABSOLUTE_ERROR

R2_SCORE (default)

EXPLAINED_VARIANCE
BOOSTED_TREE_
CLASSIFIER
PRECISION

RECALL

ACCURACY

F1_SCORE

LOG_LOSS

ROC_AUC (default)
LEARN_RATE

L1_REG

L2_REG

DROPOUT

MAX_TREE_DEPTHMAX_TREE_DEPTH

SUBSAMPLE

MIN_SPLIT_LOSS

NUM_PARALLEL_TREE

MIN_TREE_CHILD_WEIGHT

COLSAMPLE_BYTREE

COLSAMPLE_BYLEVEL

COLSAMPLE_BYNODE

BOOSTER_TYPE

DART_NORMALIZE_TYPE

TREE_METHOD
[0, ∞)

(0, ∞)

(0, ∞)

[0, 1]

[1, 20]



(0, 1]

[0, ∞)

[1, ∞)


[0, ∞)


[0, 1]


[0, 1]


[0, 1]


{GBTREE,DART}

{TREE,FOREST}

{AUTO,EXACT,APPROX,HIST}
[0, 1]

(0, 10]

(0, 10]

N/A

[1, 10]



(0, 1]

N/A

N/A


N/A


N/A


N/A


N/A


N/A

N/A

N/A
LINEAR

LOG

LOG

LINEAR

LINEAR



LINEAR

LINEAR

LINEAR


LINEAR


LINEAR


LINEAR


LINEAR


N/A

N/A

N/A
BOOSTED_TREE_
REGRESSOR






MEAN_ABSOLUTE_ERROR

MEAN_SQUARED_ERROR

MEAN_SQUARED_LOG_ERROR

MEDIAN_ABSOLUTE_ERROR

R2_SCORE (default)

EXPLAINED_VARIANCE
RANDOM_FOREST_
CLASSIFIER
PRECISION

RECALL

ACCURACY

F1_SCORE

LOG_LOSS

ROC_AUC (default)
L1_REG

L2_REG

MAX_TREE_DEPTH

SUBSAMPLE

MIN_SPLIT_LOSS

NUM_PARALLEL_TREE

MIN_TREE_CHILD_WEIGHT

COLSAMPLE_BYTREE

COLSAMPLE_BYLEVEL

COLSAMPLE_BYNODE

TREE_METHOD
(0, ∞)

(0, ∞)

[1, 20]

(0, 1)

[0, ∞)

[2, ∞)


[0, ∞)


[0, 1]


[0, 1]


[0, 1]

{AUTO,EXACT,APPROX,HIST}
(0, 10]

(0, 10]

[1, 20]

(0, 1)

N/A

[2, 200]


N/A


N/A


N/A


N/A


N/A
LOG

LOG

LINEAR

LINEAR

LINEAR

LINEAR


LINEAR


LINEAR


LINEAR


LINEAR


N/A
RANDOM_FOREST_
REGRESSOR






MEAN_ABSOLUTE_ERROR

MEAN_SQUARED_ERROR

MEAN_SQUARED_LOG_ERROR

MEDIAN_ABSOLUTE_ERROR

R2_SCORE (default)

EXPLAINED_VARIANCE

MostLOG scale hyperparameters use the open lower boundary of0. You canstill set0 as the lower boundary by using theHPARAM_RANGE keyword toset the hyperparameter range. For example, in a boosted tree classifiermodel, you could set the range for theL1_REG hyperparameterasL1_REG = HPARAM_RANGE(0, 5). A value of0 gets converted to1e-14.

Conditional hyperparameters are supported. For example, in a boosted treeregressor model, you can only tune theDART_NORMALIZE_TYPE hyperparameterwhen the value of theBOOSTER_TYPE hyperparameterisDART. In this case, you specify both search spaces and the conditionsare handled automatically, as shown in the following example:

BOOSTER_TYPE=HPARAM_CANDIDATES(['DART','GBTREE'])DART_NORMALIZE_TYPE=HPARAM_CANDIDATES(['TREE','FOREST'])

Search starting point

If you don't specify a search space for a hyperparameter by usingHPARAM_RANGE orHPARAM_CANDIDATES, the search starts from the default valueof that hyperparameter, as documented in theCREATE MODEL topic for that modeltype. For example, if you are running hyperparameter tuning for aboosted tree model,and you don't specify a value for theL1_REG hyperparameter,then the search starts from0, the default value.

If you specify a search space for a hyperparameter by usingHPARAM_RANGE orHPARAM_CANDIDATES, the search starting points depends on whether the specifiedsearch space includes the default value for that hyperparameter, as documentedin theCREATE MODEL topic for that model type:

  • If the specified range contains the default value, that's where thesearch starts. For example, if you are running hyperparameter tuning for animplicitmatrix factorization model,and you specify the value[20, 30, 40, 50] for theWALS_ALPHA hyperparameter,then the search starts at40, the default value.
  • If the specified range doesn't contain the default value, the search startsfrom the point in the specified range that is closest to the default value.For example,if you specify the value[10, 20, 30] for theWALS_ALPHAhyperparameter, then the search starts from30, which is the closest valueto the default value of40.

Data split

When you specify a value for theNUM_TRIALS option, the service identifiesthat you are doing hyperparameter tuning and automatically performs a 3-waysplit on input data to divide it into training, evaluation, and test sets.By default, the input data is randomized and then split 80% for training,10% for evaluation, and 10% for testing.

The training and evaluation sets are used in each trial training, the same asin models that don't use hyperparameter tuning. The trial hyperparametersuggestions are calculated based on themodel evaluation metricsfor that model type. At the end of each trial training, the test set is usedto test the trial and record its metrics in the model. This ensures theobjectivity of the final reporting evaluation metrics by using data thathas not yet been analyzed by the model. Evaluation data is usedto calculate the intermediate metrics for hyperparameter suggestion, while thetest data is used to calculate the final, objective model metrics.

If you want to use only a training set, specifyNO_SPLITfor theDATA_SPLIT_METHOD optionof theCREATE MODEL statement.

If you want to use only training and evaluation sets, specify0 for theDATA_SPLIT_TEST_FRACTION optionof theCREATE MODEL statement. When the test set is empty, the evaluationset is used as the test set for the final evaluation metrics reporting.

The metrics from models that are generated from a normal training job and thosefrom a hyperparameter tuning training job are only comparable when the datasplit fractions are equal. For example, the following models are comparable:

  • Non-hyperparameter tuning:DATA_SPLIT_METHOD='RANDOM', DATA_SPLIT_EVAL_FRACTION=0.2
  • Hyperparameter tuning:DATA_SPLIT_METHOD='RANDOM', DATA_SPLIT_EVAL_FRACTION=0.2, DATA_SPLIT_TEST_FRACTION=0

Performance

Model performance when using hyperparameter tuning is typically no worsethan model performance when using the default search space and not usinghyperparameter tuning. A model that uses the default search space and doesn'tuse hyperparameter tuning always uses the default hyperparameters in the firsttrial.

To confirm the model performance improvements provided by hyperparameter tuning,compare the optimal trial for the hyperparameter tuning model to the firsttrial for the non-hyperparameter tuning model.

Transfer learning

Transfer learning is enabled by default when you set theHPARAM_TUNING_ALGORITHM optionin theCREATE MODEL statement toVIZIER_DEFAULT. The hyperparametertuning for a model benefits by learning from previously tunedmodels if it meets the following requirements:

  • It has the same model type as previously tuned models.
  • It resides in the same project as previously tuned models.
  • It use the same hyperparameter search space OR asubset of thehyperparameter search space of previously tuned models. A subset uses thesame hyperparameter names and types, but doesn't have to have the same ranges.For example,(a:[0, 10]) is considered as a subset of(a:[-1, 1], b:[0, 1]).

Transfer learning doesn't require that the input data be the same.

Transfer learning helps solve the cold start problem where the system performsrandom exploration during the first trial batch. Transfer learning provides thesystem with some initial knowledge about the hyperparameters and theirobjectives. To continuously improve the model quality, always train a newhyperparameter tuning model with the same or a subset of hyperparameters.

Transfer learning helps hyperparameter tuning converge faster, instead ofhelping submodels to converge.

Error handling

Hyperparameter tuning handles errors in the following ways:

  • Cancellation: If a training job is cancelled while running, then allsuccessful trials remain usable.

  • Invalid input: If the user input is invalid, then the service returnsa user error.

  • Invalid hyperparameters: If the hyperparameters are invalid for a trial,then the trial is skipped and marked asINFEASIBLE in the output from theML.TRIAL_INFO function.

  • Trial internal error: If more than 10% of theNUM_TRIALS value fail due toINTERNAL_ERROR, then the training job stops and returns a user error.

  • If less than 10% of theNUM_TRIALS value fail due toINTERNAL_ERROR, thetraining continues with the failed trials marked asFAILED in the outputfrom theML.TRIAL_INFO function.

Model serving functions

You can use output models from hyperparameter tuning with a number of existingmodel serving functions. To use these functions, follow these rules:

  • When the function takes input data, only the result from one trial isreturned. By default this is the optimal trial, but you can also choose aparticular trial by specifying theTRIAL_ID as an argument for the givenfunction. You can get theTRIAL_ID from the output of theML.TRIAL_INFOfunction. The following functions are supported:

  • When the function doesn't take input data, all trial results are returned,and the first output column isTRIAL_ID. The following functions aresupported:

The output fromML.FEATURE_INFOdoesn't change, because all trials share the same input data.

Evaluation metrics fromML.EVALUATE andML.TRIAL_INFO can be differentbecause of the way input data is split. By default,ML.EVALUATE runs againstthe test data, whileML.TRIAL_INFO runs against the evaluation data. For moreinformation, seeData split.

Unsupported functions

TheML.TRAINING_INFO functionreturns information for each iteration, and iteration results aren't saved inhyperparameter tuning models. Trial results are saved instead. You can use theML.TRIAL_INFO functionto get information about trial results.

Model export

You can export models created with hyperparameter tuning to Cloud Storagelocations using theEXPORT MODEL statement.You can export the default optimal trial or any specified trial.

Pricing

The cost of hyperparameter tuning training is the sum of the cost of allexecuted trials. The pricing of a trial is consistent with the existingBigQuery ML pricing model.

FAQ

This section provides answers to some frequently asked questions abouthyperparameter tuning.

How many trials do I need to tune a model?

We recommend using at least 10 trials for one hyperparameter, so the totalnumber of trials should be at least10 *num_hyperparameters. If you are using the defaultsearch space, refer to theHyperparameters column in theHyperparameters and objectivestable for the number of hyperparameters tuned by default for a given model type.

What if I don't see performance improvements by using hyperparameter tuning?

Make sure you follow the guidance in this document to get a fair comparison. Ifyou still don't see performance improvements, it might mean the defaulthyperparameters already work well for you. You might want to focus on featureengineering or try other model types before trying another round ofhyperparameter tuning.

What if I want to continue tuning a model?

Train a new hyperparameter tuning model with the same search space. Thebuilt-in transfer learning helps to continue tuning based on your previouslytuned models.

Do I need to retrain the model with all data and the optimal hyperparameters?

It depends on the following factors:

  • K-means models already use all data as the training data, so there's no needto retrain the model.

  • For matrix factorization models, you can retrain the model with the selectedhyperparameters and all input data for better coverage of users and items.

  • For all other model types, retraining is usually unnecessary. The servicealready keeps 80% of the input data for training during the default randomdata split. You can still retrain the model with more training data and theselected hyperparameters if your dataset is small, but leaving littleevaluation data for early stop might worsen overfitting.

What's next

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2026-02-19 UTC.