Create a hyperparameter tuning job

Hyperparameters are variables that govern the process of training a model, suchas batch size or the number of hidden layers in a deep neural network.Hyperparameter tuning searches for the best combination of hyperparametervalues by optimizing metric values across a series of trials. Metrics arescalar summaries that you add to your trainer, such as model accuracy.

Learn more about hyperparameter tuning onVertex AI. For a step-by-step example, refer to theVertex AI: Hyperparameter Tuning codelab.

This page shows you how to:

Prepare your training application

In a hyperparameter tuning job, Vertex AI creates trials of yourtraining job with different sets of hyperparameters and evaluates theeffectiveness of a trial using the metrics you specified. Vertex AIpasses hyperparameter values to your training application as command-linearguments. For Vertex AI to evaluate the effectiveness of a trial,your training application must report your metrics to Vertex AI.

The following sections describe:

  • How Vertex AI passes hyperparameters to your trainingapplication.
  • Options for passing metrics from your training application toVertex AI.

To learn more about the requirements for serverless training applications that runon Vertex AI, readTraining code requirements.

Handle the command-line arguments for the hyperparameters you want to tune

Vertex AI sets command-line arguments when it calls your trainingapplication. Make use of the command-line arguments in your code:

  1. Define a name for each hyperparameter argument and parse it using whateverargument parser you prefer, such asargparse.Use the same argument names when configuring your hyperparametertraining job.

    For example, if your training application is a Python module namedmy_trainer and you are tuning a hyperparameter namedlearning_rate,Vertex AI starts each trial with a command like the following:

    python3 -m my_trainer --learning_ratelearning-rate-in-this-trial

    Vertex AI determines thelearning-rate-in-this-trialand passes it in using thelearning_rate argument.

  2. Assign the values from the command-line arguments to the hyperparameters inyour training code.

Learn more about the requirements for parsing command-linearguments.

Report your metrics to Vertex AI

To report your metrics to Vertex AI, use thecloudml-hypertunePython package. This library provides helperfunctions for reporting metrics to Vertex AI.

Learn more about reporting hyperparameter metrics.

Create a hyperparameter tuning job

Depending on what tool you want to use to create aHyperparameterTuningJob,select one of the following tabs:

Console

In the Google Cloud console, you can't create aHyperparameterTuningJob resourcedirectly. However, you can create aTrainingPipeline resource that creates aHyperparameterTuningJob.

The following instructions describe how to create aTrainingPipeline thatcreates aHyperparameterTuningJob and doesn't do anything else. If you want touse additionalTrainingPipeline features, like training with a manageddataset, readCreating trainingpipelines.

  1. In the Google Cloud console, in the Vertex AI section, goto theTraining pipelines page.

    Go to Training pipelines

  2. ClickCreate to open theTrain new model pane.

  3. On theTraining method step, specify the following settings:

    1. In theDataset drop-down list, selectNo manageddataset.

    2. SelectCustom training (advanced).

    ClickContinue.

  4. On theModel details step, chooseTrain new model orTrain new version.If you select train new model, enter a name of your choice,MODEL_NAME, for your model. ClickContinue.

  5. On theTraining container step, specify the following settings:

    1. Selectwhether to use aPrebuilt container or aCustomcontainerfor training.

    2. Depending on your choice, do one of the following:

    3. In theModel output directory field,you mayspecify the Cloud Storage URI of a directory in abucket that you have access to. The directory does not need to exist yet.

      This value gets passed to Vertex AI in thebaseOutputDirectory APIfield, which setsseveral environment variables that your training application can accesswhen it runs.

    4. Optional: In theArguments field, you can specify arguments forVertex AI to use when it starts running your training code.The maximum length for all arguments combined is 100,000 characters.The behavior of these arguments differs depending on what type ofcontainer you are using:

    ClickContinue.

  6. On theHyperparameter tuning step, selectEnablehyperparameter tuning checkbox and specify the following settings:

    1. In theNew Hyperparameter section, specify theParameter name andType of a hyperparameter that you want to tune. Depending on whichtype you specify, configure the additional hyperparameter settings thatappear.

      Learn more abouthyperparameter types and theirconfigurations.

    2. If you want to tune more than one hyperparameter, clickAdd newparameter and repeat the previous step in the new section that appears.

      Repeat this for each hyperparameter that you want to tune.

    3. In theMetric to optimize field and theGoal drop-down list,specify the name and goal of themetric that you want tooptimize.

    4. In theMaximum number of trials field, specify themaximum number oftrialsthat you want Vertex AI to run for your hyperparametertuning job.

    5. In theMaximum number of parallel trials field, specify themaximumnumber of trials to let Vertex AI run at the sametime.

    6. In theSearch algorithm drop-down list, specify asearchalgorithmfor Vertex AI to use.

    7. Ignore theEnable early stopping toggle, which has no effect.

    ClickContinue.

  7. On theCompute and pricing step, specify the following settings:

    1. In theRegion drop-down list, select a "region that supports customtraining"

    2. In theWorker pool 0 section, specifycomputeresources to use for training.

      If you specify accelerators,make sure the type of accelerator that youchoose is available in your selectedregion.

      If you want to performdistributedtraining, then clickAdd moreworker pools and specify an additional set of compute resources foreach additional worker pool that you want.

    ClickContinue.

  8. On thePrediction container step, selectNo predictioncontainer.

  9. ClickStart training to start the serverless training pipeline.

gcloud

The following steps show how to use the Google Cloud CLI to create aHyperparameterTuningJob with a relatively minimal configuration. To learnabout all the configuration options that you can use for this task, see thereference documentation for thegcloud ai hp-tuning-jobs create commandand theHyperparameterTuningJob API resource.

  1. Create a YAML file namedconfig.yaml with some API fields that you wantto specify for your newHyerparameterTuningJob:

    config.yaml
    studySpec:metrics:-metricId:METRIC_IDgoal:METRIC_GOALparameters:-parameterId:HYPERPARAMETER_IDdoubleValueSpec:minValue:DOUBLE_MIN_VALUEmaxValue:DOUBLE_MAX_VALUEtrialJobSpec:workerPoolSpecs:-machineSpec:machineType:MACHINE_TYPEreplicaCount:1containerSpec:imageUri:CUSTOM_CONTAINER_IMAGE_URI

    Replace the following:

    • METRIC_ID: the name of ahyperparameter metric tooptimize. Your training code mustreport this metric when itruns.

    • METRIC_GOAL: the goal for your hyperparameter metric,eitherMAXIMIZE orMINIMIZE.

    • HYPERPARAMETER_ID: the name of a hyperparameter to tune.Your training code mustparse a command-line flag with this name. For this example, the hyperparameter must takefloating-point values. Learn about otherhyperparameter data types.

    • DOUBLE_MIN_VALUE: The minimum value (a number)that you want Vertex AI to try for this hyperparameter.

    • DOUBLE_MAX_VALUE: The maximum value (a number)that you want Vertex AI to try for this hyperparameter.

    • MACHINE_TYPE: thetype of VM to use for training.

    • CUSTOM_CONTAINER_IMAGE_URI: the URI of a Dockercontainer image with your training code. Learn how tocreate a custom container image.

      For this example, you must use a custom container.HyperparameterTuningJob resources also supporttraining code in a Python source distribution instead of a customcontainer.

  2. In the same directory as yourconfig.yaml file, run the following shellcommand:

    gcloudaihp-tuning-jobscreate\--region=LOCATION\--display-name=DISPLAY_NAME\--max-trial-count=MAX_TRIAL_COUNT\--parallel-trial-count=PARALLEL_TRIAL_COUNT\--config=config.yaml

    Replace the following:

REST

Use the following code sample to create a hyperparameter tuning job usingthecreate method of thehyperparameterTuningJobresource.

Before using any of the request data, make the following replacements:

  • LOCATION: the region where you want to create theHyperparameterTuningJob. Use aregion that supports serverless training.
  • PROJECT: Yourproject ID.
  • DISPLAY_NAME: a memorable display name of your choicefor theHyperparameterTuningJob. SeeREST resource.
  • Specify your metrics:
  • Specify your hyperparameters:
    • HYPERPARAMETER_ID: the name of a hyperparameter to tune.Your training code mustparse a command-line flag with this name.
    • PARAMETER_SCALE: (Optional.) How the parameter should be scaled. Leave unset for CATEGORICAL parameters. Can beUNIT_LINEAR_SCALE,UNIT_LOG_SCALE,UNIT_REVERSE_LOG_SCALE, orSCALE_TYPE_UNSPECIFIED
    • If this hyperparameter's type is DOUBLE, specify the minimum (DOUBLE_MIN_VALUE) and maximum (DOUBLE_MAX_VALUE) values for this hyperparameter.
    • If this hyperparameter's type is INTEGER, specify the minimum (INTEGER_MIN_VALUE) and maximum (INTEGER_MAX_VALUE) values for this hyperparameter.
    • If this hyperparameter's type is CATEGORICAL, specify the acceptable values (CATEGORICAL_VALUES) as an array of strings.
    • If this hyperparameter's type is DISCRETE, specify the acceptable values (DISCRETE_VALUES) as an array of numbers.
    • Specify conditional hyperparameters. Conditional hyperparameters are added to a trial when the parent hyperparameter's value matches the condition you specify. Learn more about conditional hyperparameters.
      • CONDITIONAL_PARAMETER: TheParameterSpec of the conditional parameter. This specification includes the parameter's name, scale, range of values, and any conditional parameters that depend on this hyperparameter.
      • If the parent hyperparameter's type is INTEGER, specify a list of integers as theINTEGERS_TO_MATCH. If the parent hyperparameter's value matches one of the values specified, this conditional parameter is added to the trial.
      • If the parent hyperparameter's type is CATEGORICAL, specify a list of categories as theCATEGORIES_TO_MATCH. If the parent hyperparameter's value matches one of the values specified, this conditional parameter is added to the trial.
      • If the parent hyperparameter's type is DISCRETE, specify a list of integers as theDISCRETE_VALUES_TO_MATCH. If the parent hyperparameter's value matches one of the values specified, this conditional parameter is added to the trial.
  • ALGORITHM: (Optional.) The search algorithm to use in this hyperparameter tuning job. Can beALGORITHM_UNSPECIFIED,GRID_SEARCH, orRANDOM_SEARCH.
  • MAX_TRIAL_COUNT: themaximum number of trials to run.
  • PARALLEL_TRIAL_COUNT: themaximum number of trials to run in parallel.
  • MAX_FAILED_TRIAL_COUNT: The number of jobs that can fail before the hyperparameter tuning job fails.
  • Define the trial custom training job:
    • MACHINE_TYPE: thetype of VM to use for training.
    • ACCELERATOR_TYPE: (Optional.) The type of accelerator to attach to each trial.
    • ACCELERATOR_COUNT: (Optional.) The number of accelerators to attach to each trial.
    • REPLICA_COUNT: The number of worker replicas to use for each trial.
    • If your training application runs in a custom container, specify the following:
      • CUSTOM_CONTAINER_IMAGE_URI: the URI of a Dockercontainer image with your training code. Learn how tocreate a custom container image.
      • CUSTOM_CONTAINER_COMMAND: (Optional.) The command to be invoked when the container is started. This command overrides the container's default entrypoint.
      • CUSTOM_CONTAINER_ARGS: (Optional.) The arguments to be passed when starting the container.
    • If your training application is a Python package that runs in a prebuilt container, specify the following:
      • PYTHON_PACKAGE_EXECUTOR_IMAGE_URI: The URI of the container image that runs the provided python package. Learn more aboutprebuilt containers for training.
      • PYTHON_PACKAGE_URIS: The Cloud Storage location of the Python package files which are the training program and its dependent packages. The maximum number of package URIs is 100.
      • PYTHON_MODULE: The Python module name to run after installing the packages.
      • PYTHON_PACKAGE_ARGS: (Optional.) Command-line arguments to be passed to the Python module.
    • SERVICE_ACCOUNT: (Optional.) The service account that Vertex AI will use to run your code. Learn more about attaching a custom service account.
    • TIMEOUT: (Optional.) The maximum running time for each trial.
  • Specify theLABEL_NAME andLABEL_VALUE for any labels that you want to apply to this hyperparameter tuning job.

HTTP method and URL:

POST https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT/locations/LOCATION/hyperparameterTuningJobs

Request JSON body:

{  "displayName":DISPLAY_NAME,  "studySpec": {    "metrics": [      {        "metricId":METRIC_ID,        "goal":METRIC_GOAL      }    ],    "parameters": [      {        "parameterId":PARAMETER_ID,        "scaleType":PARAMETER_SCALE,        // Union field parameter_value_spec can be only one of the following:        "doubleValueSpec": {            "minValue":DOUBLE_MIN_VALUE,            "maxValue":DOUBLE_MAX_VALUE        },        "integerValueSpec": {            "minValue":INTEGER_MIN_VALUE,            "maxValue":INTEGER_MAX_VALUE        },        "categoricalValueSpec": {            "values": [CATEGORICAL_VALUES            ]        },        "discreteValueSpec": {            "values": [DISCRETE_VALUES            ]        }        // End of list of possible types for union field parameter_value_spec.        "conditionalParameterSpecs": [            "parameterSpec": {CONDITIONAL_PARAMETER            }            // Union field parent_value_condition can be only one of the following:            "parentIntValues": {                "values": [INTEGERS_TO_MATCH]            }            "parentCategoricalValues": {                "values": [CATEGORIES_TO_MATCH]            }            "parentDiscreteValues": {                "values": [DISCRETE_VALUES_TO_MATCH]            }            // End of list of possible types for union field parent_value_condition.        ]      }    ],    "ALGORITHM":ALGORITHM  },  "maxTrialCount":MAX_TRIAL_COUNT,  "parallelTrialCount":PARALLEL_TRIAL_COUNT,  "maxFailedTrialCount":MAX_FAILED_TRIAL_COUNT,  "trialJobSpec": {      "workerPoolSpecs": [        {          "machineSpec": {            "machineType":MACHINE_TYPE,            "acceleratorType":ACCELERATOR_TYPE,            "acceleratorCount":ACCELERATOR_COUNT          },          "replicaCount":REPLICA_COUNT,          // Union field task can be only one of the following:          "containerSpec": {            "imageUri":CUSTOM_CONTAINER_IMAGE_URI,            "command": [CUSTOM_CONTAINER_COMMAND            ],            "args": [CUSTOM_CONTAINER_ARGS            ]          },          "pythonPackageSpec": {            "executorImageUri":PYTHON_PACKAGE_EXECUTOR_IMAGE_URI,            "packageUris": [PYTHON_PACKAGE_URIS            ],            "pythonModule":PYTHON_MODULE,            "args": [PYTHON_PACKAGE_ARGS            ]          }          // End of list of possible types for union field task.        }      ],      "scheduling": {        "TIMEOUT":TIMEOUT      },      "serviceAccount":SERVICE_ACCOUNT  },  "labels": {LABEL_NAME_1":LABEL_VALUE_1,LABEL_NAME_2":LABEL_VALUE_2  }}

To send your request, expand one of these options:

curl (Linux, macOS, or Cloud Shell)

Note: The following command assumes that you have logged in to thegcloud CLI with your user account by runninggcloud init orgcloud auth login , or by usingCloud Shell, which automatically logs you into thegcloud CLI . You can check the currently active account by runninggcloud auth list.

Save the request body in a file namedrequest.json, and execute the following command:

curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json; charset=utf-8" \
-d @request.json \
"https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT/locations/LOCATION/hyperparameterTuningJobs"

PowerShell (Windows)

Note: The following command assumes that you have logged in to thegcloud CLI with your user account by runninggcloud init orgcloud auth login . You can check the currently active account by runninggcloud auth list.

Save the request body in a file namedrequest.json, and execute the following command:

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }

Invoke-WebRequest `
-Method POST `
-Headers $headers `
-ContentType: "application/json; charset=utf-8" `
-InFile request.json `
-Uri "https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT/locations/LOCATION/hyperparameterTuningJobs" | Select-Object -Expand Content

You should receive a JSON response similar to the following:

{  "name": "projects/12345/locations/us-central1/hyperparameterTuningJobs/6789",  "displayName": "myHyperparameterTuningJob",  "studySpec": {    "metrics": [      {        "metricId": "myMetric",        "goal": "MINIMIZE"      }    ],    "parameters": [      {        "parameterId": "myParameter1",        "integerValueSpec": {          "minValue": "1",          "maxValue": "128"        },        "scaleType": "UNIT_LINEAR_SCALE"      },      {        "parameterId": "myParameter2",        "doubleValueSpec": {          "minValue": 1e-07,          "maxValue": 1        },        "scaleType": "UNIT_LINEAR_SCALE"      }    ],    "ALGORITHM": "RANDOM_SEARCH"  },  "maxTrialCount": 20,  "parallelTrialCount": 1,  "trialJobSpec": {    "workerPoolSpecs": [      {        "machineSpec": {          "machineType": "n1-standard-4"        },        "replicaCount": "1",        "pythonPackageSpec": {          "executorImageUri": "us-docker.pkg.dev/vertex-ai/training/training-tf-cpu.2-1:latest",          "packageUris": [            "gs://my-bucket/my-training-application/trainer.tar.bz2"          ],          "pythonModule": "my-trainer.trainer"        }      }    ]  }}

Java

Before trying this sample, follow theJava setup instructions in theVertex AI quickstart using client libraries. For more information, see theVertex AIJava API reference documentation.

To authenticate to Vertex AI, set up Application Default Credentials. For more information, seeSet up authentication for a local development environment.

importcom.google.cloud.aiplatform.v1.AcceleratorType;importcom.google.cloud.aiplatform.v1.CustomJobSpec;importcom.google.cloud.aiplatform.v1.HyperparameterTuningJob;importcom.google.cloud.aiplatform.v1.JobServiceClient;importcom.google.cloud.aiplatform.v1.JobServiceSettings;importcom.google.cloud.aiplatform.v1.LocationName;importcom.google.cloud.aiplatform.v1.MachineSpec;importcom.google.cloud.aiplatform.v1.PythonPackageSpec;importcom.google.cloud.aiplatform.v1.StudySpec;importcom.google.cloud.aiplatform.v1.StudySpec.MetricSpec;importcom.google.cloud.aiplatform.v1.StudySpec.MetricSpec.GoalType;importcom.google.cloud.aiplatform.v1.StudySpec.ParameterSpec;importcom.google.cloud.aiplatform.v1.StudySpec.ParameterSpec.ConditionalParameterSpec;importcom.google.cloud.aiplatform.v1.StudySpec.ParameterSpec.ConditionalParameterSpec.DiscreteValueCondition;importcom.google.cloud.aiplatform.v1.StudySpec.ParameterSpec.DiscreteValueSpec;importcom.google.cloud.aiplatform.v1.StudySpec.ParameterSpec.DoubleValueSpec;importcom.google.cloud.aiplatform.v1.StudySpec.ParameterSpec.ScaleType;importcom.google.cloud.aiplatform.v1.WorkerPoolSpec;importjava.io.IOException;importjava.util.Arrays;publicclassCreateHyperparameterTuningJobPythonPackageSample{publicstaticvoidmain(String[]args)throwsIOException{// TODO(developer): Replace these variables before running the sample.Stringproject="PROJECT";StringdisplayName="DISPLAY_NAME";StringexecutorImageUri="EXECUTOR_IMAGE_URI";StringpackageUri="PACKAGE_URI";StringpythonModule="PYTHON_MODULE";createHyperparameterTuningJobPythonPackageSample(project,displayName,executorImageUri,packageUri,pythonModule);}staticvoidcreateHyperparameterTuningJobPythonPackageSample(Stringproject,StringdisplayName,StringexecutorImageUri,StringpackageUri,StringpythonModule)throwsIOException{JobServiceSettingssettings=JobServiceSettings.newBuilder().setEndpoint("us-central1-aiplatform.googleapis.com:443").build();Stringlocation="us-central1";// Initialize client that will be used to send requests. This client only needs to be created// once, and can be reused for multiple requests. After completing all of your requests, call// the "close" method on the client to safely clean up any remaining background resources.try(JobServiceClientclient=JobServiceClient.create(settings)){// study specMetricSpecmetric=MetricSpec.newBuilder().setMetricId("val_rmse").setGoal(GoalType.MINIMIZE).build();// decayDoubleValueSpecdoubleValueSpec=DoubleValueSpec.newBuilder().setMinValue(1e-07).setMaxValue(1).build();ParameterSpecparameterDecaySpec=ParameterSpec.newBuilder().setParameterId("decay").setDoubleValueSpec(doubleValueSpec).setScaleType(ScaleType.UNIT_LINEAR_SCALE).build();Double[]decayValues={32.0,64.0};DiscreteValueConditiondiscreteValueDecay=DiscreteValueCondition.newBuilder().addAllValues(Arrays.asList(decayValues)).build();ConditionalParameterSpecconditionalParameterDecay=ConditionalParameterSpec.newBuilder().setParameterSpec(parameterDecaySpec).setParentDiscreteValues(discreteValueDecay).build();// learning rateParameterSpecparameterLearningSpec=ParameterSpec.newBuilder().setParameterId("learning_rate").setDoubleValueSpec(doubleValueSpec)// Use the same min/max as for decay.setScaleType(ScaleType.UNIT_LINEAR_SCALE).build();Double[]learningRateValues={4.0,8.0,16.0};DiscreteValueConditiondiscreteValueLearning=DiscreteValueCondition.newBuilder().addAllValues(Arrays.asList(learningRateValues)).build();ConditionalParameterSpecconditionalParameterLearning=ConditionalParameterSpec.newBuilder().setParameterSpec(parameterLearningSpec).setParentDiscreteValues(discreteValueLearning).build();// batch sizeDouble[]batchSizeValues={4.0,8.0,16.0,32.0,64.0,128.0};DiscreteValueSpecdiscreteValueSpec=DiscreteValueSpec.newBuilder().addAllValues(Arrays.asList(batchSizeValues)).build();ParameterSpecparameter=ParameterSpec.newBuilder().setParameterId("batch_size").setDiscreteValueSpec(discreteValueSpec).setScaleType(ScaleType.UNIT_LINEAR_SCALE).addConditionalParameterSpecs(conditionalParameterDecay).addConditionalParameterSpecs(conditionalParameterLearning).build();// trial_job_specMachineSpecmachineSpec=MachineSpec.newBuilder().setMachineType("n1-standard-4").setAcceleratorType(AcceleratorType.NVIDIA_TESLA_T4).setAcceleratorCount(1).build();PythonPackageSpecpythonPackageSpec=PythonPackageSpec.newBuilder().setExecutorImageUri(executorImageUri).addPackageUris(packageUri).setPythonModule(pythonModule).build();WorkerPoolSpecworkerPoolSpec=WorkerPoolSpec.newBuilder().setMachineSpec(machineSpec).setReplicaCount(1).setPythonPackageSpec(pythonPackageSpec).build();StudySpecstudySpec=StudySpec.newBuilder().addMetrics(metric).addParameters(parameter).setAlgorithm(StudySpec.Algorithm.RANDOM_SEARCH).build();CustomJobSpectrialJobSpec=CustomJobSpec.newBuilder().addWorkerPoolSpecs(workerPoolSpec).build();// hyperparameter_tuning_jobHyperparameterTuningJobhyperparameterTuningJob=HyperparameterTuningJob.newBuilder().setDisplayName(displayName).setMaxTrialCount(4).setParallelTrialCount(2).setStudySpec(studySpec).setTrialJobSpec(trialJobSpec).build();LocationNameparent=LocationName.of(project,location);HyperparameterTuningJobresponse=client.createHyperparameterTuningJob(parent,hyperparameterTuningJob);System.out.format("response: %s\n",response);System.out.format("Name: %s\n",response.getName());}}}

Python

To learn how to install or update the Vertex AI SDK for Python, seeInstall the Vertex AI SDK for Python. For more information, see thePython API reference documentation.

fromgoogle.cloudimportaiplatformfromgoogle.cloud.aiplatformimporthyperparameter_tuningashptdefcreate_hyperparameter_tuning_job_sample(project:str,location:str,staging_bucket:str,display_name:str,container_uri:str,):aiplatform.init(project=project,location=location,staging_bucket=staging_bucket)worker_pool_specs=[{"machine_spec":{"machine_type":"n1-standard-4","accelerator_type":"NVIDIA_TESLA_K80","accelerator_count":1,},"replica_count":1,"container_spec":{"image_uri":container_uri,"command":[],"args":[],},}]custom_job=aiplatform.CustomJob(display_name='custom_job',worker_pool_specs=worker_pool_specs,)hpt_job=aiplatform.HyperparameterTuningJob(display_name=display_name,custom_job=custom_job,metric_spec={'loss':'minimize',},parameter_spec={'lr':hpt.DoubleParameterSpec(min=0.001,max=0.1,scale='log'),'units':hpt.IntegerParameterSpec(min=4,max=128,scale='linear'),'activation':hpt.CategoricalParameterSpec(values=['relu','selu']),'batch_size':hpt.DiscreteParameterSpec(values=[128,256],scale='linear')},max_trial_count=128,parallel_trial_count=8,labels={'my_key':'my_value'},)hpt_job.run()print(hpt_job.resource_name)returnhpt_job

Hyperparameter training job configuration

Hyperparameter tuning jobs search for the best combination of hyperparametersto optimize your metrics. Hyperparameter tuning jobs do this by runningmultiple trials of your training application with different sets ofhyperparameters.

When you configure a hyperparameter tuning job, you must specify thefollowing details:

Limit the number of trials

Decide how many trials you want to allow the service to run and setthemaxTrialCount value in theHyperparameterTuningJobobject.

There are two competing interests to consider when deciding how many trials toallow:

  • time (and therefore cost)
  • accuracy

Increasing the number of trials generally yields better results, but it is notalways so. Usually, there is a point of diminishing returns after whichadditional trials have little or no effect on the accuracy. Before starting ajob with a large number of trials, you may want to start with a small numberof trials to gauge the effect your chosen hyperparameters have on yourmodel's accuracy.

To get the most out of hyperparameter tuning, you shouldn't set your maximumvalue lower than ten times the number of hyperparameters you use.

Parallel trials

You can specify how many trials can run in parallel by settingparallelTrialCount in theHyperparameterTuningJob.

Running parallel trials has the benefit of reducing the time the training jobtakes (real time—the total processing timerequired is not typically changed). However, running in parallel can reduce theeffectiveness of the tuning job overall. That is because hyperparameter tuninguses the results of previous trials to inform the values to assign to thehyperparameters of subsequent trials. When running in parallel, some trialsstart without having the benefit of the results of any trials still running.

If you use parallel trials, the hyperparameter tuning service provisionsmultiple training processing clusters (or multiple individual machines in thecase of a single-process trainer). The work pool spec that you set for your jobis used for each individual training cluster.

Handle failed trials

If your hyperparameter tuning trials exit with errors, you might want toend the training job early. Set themaxFailedTrialCount field in theHyperparameterTuningJob to the number of failed trials thatyou want to allow. After this number of trials fails, Vertex AIends the training job. ThemaxFailedTrialCount value must be less than or equal tomaxTrialCount.

If you don't setmaxFailedTrialCount, or if you set it to0,Vertex AI uses the following rules to handle failing trials:

  • If the first trial of your job fails, Vertex AI ends the jobimmediately. Failure during the first trial suggests a problem in yourtraining code, so further trials are also likely to fail. Ending the job letsyou diagnose the problem without waiting for more trials and incurringgreater costs.
  • If the first trial succeeds, Vertex AI might end the job afterfailures during subsequent trials based on one of the following criteria:
    • The number of failed trials has grown too high.
    • The ratio of failed trials to successful trials has grown too high.

These rules are subject to change. To ensure a specific behavior,set themaxFailedTrialCount field.

Manage hyperparameter tuning jobs

The following sections describe how to manage your hyperparameter tuning jobs.

Retrieve information about a hyperparameter tuning job

The following code samples demonstrate how to retrieve a hyperparameter tuningjob.

gcloud

Use thegcloud ai hp-tuning-jobs describe command:

gcloudaihp-tuning-jobsdescribeID_OR_NAME\--region=LOCATION

Replace the following:

  • ID_OR_NAME: either thename orthe numerical ID of theHyperparameterTuningJob. (The ID is the last part ofthe name.)

    You might have seen the ID or name when you created theHyperparameterTuningJob. If you don't know the ID or name, you can run thegcloud ai hp-tuning-jobs list command and look for the appropriateresource.

  • LOCATION: the region where theHyperparameterTuningJobwas created.

REST

Use the following code sample to retrieve a hyperparameter tuning job usingtheget method of thehyperparameterTuningJobresource.

Before using any of the request data, make the following replacements:

  • LOCATION: the region where theHyperparameterTuningJobwas created.
  • NAME: The name of the hyperparameter tuning job. The job name uses the following formatprojects/{project}/LOCATIONS/{LOCATION}/hyperparameterTuningJobs/{hyperparameterTuningJob}.

HTTP method and URL:

GET https://LOCATION-aiplatform.googleapis.com/v1/NAME

To send your request, expand one of these options:

curl (Linux, macOS, or Cloud Shell)

Note: The following command assumes that you have logged in to thegcloud CLI with your user account by runninggcloud init orgcloud auth login , or by usingCloud Shell, which automatically logs you into thegcloud CLI . You can check the currently active account by runninggcloud auth list.

Execute the following command:

curl -X GET \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
"https://LOCATION-aiplatform.googleapis.com/v1/NAME"

PowerShell (Windows)

Note: The following command assumes that you have logged in to thegcloud CLI with your user account by runninggcloud init orgcloud auth login . You can check the currently active account by runninggcloud auth list.

Execute the following command:

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }

Invoke-WebRequest `
-Method GET `
-Headers $headers `
-Uri "https://LOCATION-aiplatform.googleapis.com/v1/NAME" | Select-Object -Expand Content

You should receive a JSON response similar to the following:

{  "name": "projects/12345/LOCATIONs/us-central1/hyperparameterTuningJobs/6789",  "displayName": "my-hyperparameter-tuning-job",  "studySpec": {    "metrics": [      {        "metricId": "my_metric",        "goal": "MINIMIZE"      }    ],    "parameters": [      {        "parameterId": "my_parameter",        "doubleValueSpec": {          "minValue": 1e-05,          "maxValue": 1        }      }    ]  },  "maxTrialCount": 3,  "parallelTrialCount": 1,  "trialJobSpec": {    "workerPoolSpecs": [      {        "machineSpec": {          "machineType": "n1-standard-4"        },        "replicaCount": "1",        "pythonPackageSpec": {          "executorImageUri": "us-docker.pkg.dev/vertex-ai/training/training-tf-cpu.2-1:latest",          "packageUris": [            "gs://my-bucket/my-training-application/trainer.tar.bz2"          ],          "pythonModule": "my-trainer.trainer"        }      }    ]  },  "trials": [    {      "id": "2",      "state": "SUCCEEDED",      "parameters": [        {          "parameterId": "my_parameter",          "value": 0.71426874725564571        }      ],      "finalMeasurement": {        "stepCount": "2",        "metrics": [          {            "metricId": "my_metric",            "value": 0.30007445812225342          }        ]      },      "startTime": "2020-09-09T23:39:15.549112551Z",      "endTime": "2020-09-09T23:47:08Z"    },    {      "id": "3",      "state": "SUCCEEDED",      "parameters": [        {          "parameterId": "my_parameter",          "value": 0.3078893356622992        }      ],      "finalMeasurement": {        "stepCount": "2",        "metrics": [          {            "metricId": "my_metric",            "value": 0.30000102519989014          }        ]      },      "startTime": "2020-09-09T23:49:22.451699360Z",      "endTime": "2020-09-09T23:57:15Z"    },    {      "id": "1",      "state": "SUCCEEDED",      "parameters": [        {          "parameterId": "my_parameter",          "value": 0.500005        }      ],      "finalMeasurement": {        "stepCount": "2",        "metrics": [          {            "metricId": "my_metric",            "value": 0.30005377531051636          }        ]      },      "startTime": "2020-09-09T23:23:12.283374629Z",      "endTime": "2020-09-09T23:36:56Z"    }  ],  "state": "JOB_STATE_SUCCEEDED",  "createTime": "2020-09-09T23:22:31.777386Z",  "startTime": "2020-09-09T23:22:34Z",  "endTime": "2020-09-10T01:31:24.271307Z",  "updateTime": "2020-09-10T01:31:24.271307Z"}

Java

Before trying this sample, follow theJava setup instructions in theVertex AI quickstart using client libraries. For more information, see theVertex AIJava API reference documentation.

To authenticate to Vertex AI, set up Application Default Credentials. For more information, seeSet up authentication for a local development environment.

importcom.google.cloud.aiplatform.v1.HyperparameterTuningJob;importcom.google.cloud.aiplatform.v1.HyperparameterTuningJobName;importcom.google.cloud.aiplatform.v1.JobServiceClient;importcom.google.cloud.aiplatform.v1.JobServiceSettings;importjava.io.IOException;publicclassGetHyperparameterTuningJobSample{publicstaticvoidmain(String[]args)throwsIOException{// TODO(developer): Replace these variables before running the sample.Stringproject="PROJECT";StringhyperparameterTuningJobId="HYPERPARAMETER_TUNING_JOB_ID";getHyperparameterTuningJobSample(project,hyperparameterTuningJobId);}staticvoidgetHyperparameterTuningJobSample(Stringproject,StringhyperparameterTuningJobId)throwsIOException{JobServiceSettingssettings=JobServiceSettings.newBuilder().setEndpoint("us-central1-aiplatform.googleapis.com:443").build();Stringlocation="us-central1";// Initialize client that will be used to send requests. This client only needs to be created// once, and can be reused for multiple requests. After completing all of your requests, call// the "close" method on the client to safely clean up any remaining background resources.try(JobServiceClientclient=JobServiceClient.create(settings)){HyperparameterTuningJobNamename=HyperparameterTuningJobName.of(project,location,hyperparameterTuningJobId);HyperparameterTuningJobresponse=client.getHyperparameterTuningJob(name);System.out.format("response: %s\n",response);}}}

Python

To learn how to install or update the Vertex AI SDK for Python, seeInstall the Vertex AI SDK for Python. For more information, see thePython API reference documentation.

# Copyright 2022 Google LLC## Licensed under the Apache License, Version 2.0 (the "License");# you may not use this file except in compliance with the License.# You may obtain a copy of the License at##     https://www.apache.org/licenses/LICENSE-2.0## Unless required by applicable law or agreed to in writing, software# distributed under the License is distributed on an "AS IS" BASIS,# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.# See the License for the specific language governing permissions and# limitations under the License.fromgoogle.cloudimportaiplatformdefget_hyperparameter_tuning_job_sample(project:str,hyperparameter_tuning_job_id:str,location:str="us-central1",):aiplatform.init(project=project,location=location)hpt_job=aiplatform.HyperparameterTuningJob.get(resource_name=hyperparameter_tuning_job_id,)returnhpt_job

Cancel a hyperparameter tuning job

The following code samples demonstrate how to cancel a hyperparameter tuningjob.

gcloud

Use thegcloud ai hp-tuning-jobs cancel command:

gcloudaihp-tuning-jobscancelID_OR_NAME\--region=LOCATION

Replace the following:

  • ID_OR_NAME: either thename orthe numerical ID of theHyperparameterTuningJob. (The ID is the last part ofthe name.)

    You might have seen the ID or name when you created theHyperparameterTuningJob. If you don't know the ID or name, you can run thegcloud ai hp-tuning-jobs list command and look for the appropriateresource.

  • LOCATION: the region where theHyperparameterTuningJobwas created.

REST

Use the following code sample to cancel a hyperparameter tuning job usingthecancel method of thehyperparameterTuningJobresource.

Before using any of the request data, make the following replacements:

  • LOCATION: the region where theHyperparameterTuningJobwas created.
  • NAME: The name of the hyperparameter tuning job. The job name uses the following formatprojects/{project}/locations/{location}/hyperparameterTuningJobs/{hyperparameterTuningJob}.

HTTP method and URL:

POST https://LOCATION-aiplatform.googleapis.com/v1/NAME:cancel

To send your request, expand one of these options:

curl (Linux, macOS, or Cloud Shell)

Note: The following command assumes that you have logged in to thegcloud CLI with your user account by runninggcloud init orgcloud auth login , or by usingCloud Shell, which automatically logs you into thegcloud CLI . You can check the currently active account by runninggcloud auth list.

Execute the following command:

curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json; charset=utf-8" \
-d "" \
"https://LOCATION-aiplatform.googleapis.com/v1/NAME:cancel"

PowerShell (Windows)

Note: The following command assumes that you have logged in to thegcloud CLI with your user account by runninggcloud init orgcloud auth login . You can check the currently active account by runninggcloud auth list.

Execute the following command:

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }

Invoke-WebRequest `
-Method POST `
-Headers $headers `
-Uri "https://LOCATION-aiplatform.googleapis.com/v1/NAME:cancel" | Select-Object -Expand Content

You should receive a successful status code (2xx) and an empty response.

Python

To learn how to install or update the Vertex AI SDK for Python, seeInstall the Vertex AI SDK for Python. For more information, see thePython API reference documentation.

fromgoogle.cloudimportaiplatformdefcancel_hyperparameter_tuning_job_sample(project:str,hyperparameter_tuning_job_id:str,location:str="us-central1",):aiplatform.init(project=project,location=location)hpt_job=aiplatform.HyperparameterTuningJob.get(resource_name=hyperparameter_tuning_job_id,)hpt_job.cancel()

Delete a hyperparameter tuning job

The following code samples demonstrate how to delete a hyperparameter tuningjob using the Vertex AI SDK for Python and the REST API.

REST

Use the following code sample to delete a hyperparameter tuning job usingthedelete method of thehyperparameterTuningJobresource.

Before using any of the request data, make the following replacements:

  • LOCATION: Your region.
  • NAME: The name of the hyperparameter tuning job. The job name uses the following formatprojects/{project}/LOCATIONs/{LOCATION}/hyperparameterTuningJobs/{hyperparameterTuningJob}.

HTTP method and URL:

DELETE https://LOCATION-aiplatform.googleapis.com/v1/NAME

To send your request, expand one of these options:

curl (Linux, macOS, or Cloud Shell)

Note: The following command assumes that you have logged in to thegcloud CLI with your user account by runninggcloud init orgcloud auth login , or by usingCloud Shell, which automatically logs you into thegcloud CLI . You can check the currently active account by runninggcloud auth list.

Execute the following command:

curl -X DELETE \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
"https://LOCATION-aiplatform.googleapis.com/v1/NAME"

PowerShell (Windows)

Note: The following command assumes that you have logged in to thegcloud CLI with your user account by runninggcloud init orgcloud auth login . You can check the currently active account by runninggcloud auth list.

Execute the following command:

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }

Invoke-WebRequest `
-Method DELETE `
-Headers $headers `
-Uri "https://LOCATION-aiplatform.googleapis.com/v1/NAME" | Select-Object -Expand Content

You should receive a successful status code (2xx) and an empty response.

Python

To learn how to install or update the Vertex AI SDK for Python, seeInstall the Vertex AI SDK for Python. For more information, see thePython API reference documentation.

fromgoogle.cloudimportaiplatformdefdelete_hyperparameter_tuning_job_sample(project:str,hyperparameter_tuning_job_id:str,location:str="us-central1",):aiplatform.init(project=project,location=location)hpt_job=aiplatform.HyperparameterTuningJob.get(resource_name=hyperparameter_tuning_job_id,)hpt_job.delete()

What's next

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2025-12-15 UTC.