Prediction classes

The Vertex AI SDK includes the following prediction classes. Oneclass is for batch predictions. The others are related to online predictions orVector Search predictions. For more information,seeOverview of getting predictions on Vertex AI.

Batch prediction class

A batch prediction is a group of asynchronous prediction requests. You requestbatch predictions from the model resource without needing to deploy the model toan endpoint. Batch predictions are suitable for when you don't need an immediateresponse and want to process data with a single request.BatchPredictionJob is the one class in theVertex AI SDK that is specific to batch predictions.

BatchPredictionJob

TheBatchPredictionJob class represents agroup of asynchronous prediction requests. There are two ways to create a batchprediction job:

  1. The preferred way to create a batch prediction job is to use thebatch_predictmethod on your trainedModel. This method requires thefollowing parameters:

    • instances_format: The format of the batch prediction request file:jsonl,csv,bigquery,tf-record,tf-record-gzip, orfile-list.
    • prediction_format: The format of the batch prediction response file:jsonl,csv,bigquery,tf-record,tf-record-gzip, orfile-list.
    • gcs_source: A list of one or more Cloud Storage paths to your batchprediction requests.
    • gcs_destination_prefix: The Cloud Storage path to which Vertex AIwrites the predictions.

    The following code is an example of how you might callModel.batch_predict:

    batch_prediction_job=model.batch_predict(instances_format="jsonl",predictions_format="jsonl",job_display_name="your_job_display_name_string",gcs_source=['gs://path/to/my/dataset.csv'],gcs_destination_prefix='gs://path/to/my/destination',model_parameters=None,starting_replica_count=1,max_replica_count=5,machine_type="n1-standard-4",sync=True)
  2. The second way to create a batch prediction job is to call theBatchPredictionJob.create method. TheBatchPredictionJob.create method requires four parameters:

    • job_display_name: A name you that you assign to the batch prediction job.Note that whilejob_display_name is required forBatchPredictionJob.create,it is optional forModel.batch_predict.
    • model_name: The fully-qualified name or ID of the trainedModel you use for the batch prediction job.
    • instances_format: The format of the batch prediction request file:jsonl,csv,bigquery,tf-record,tf-record-gzip, orfile-list.
    • predictions_format: The format of the batch prediction response file:jsonl,csv,bigquery,tf-record,tf-record-gzip, orfile-list.

Online prediction classes

Online predictions are synchronous requests made to a model endpoint. You must deploy your model to an endpoint before you can make an online prediction request. Use online predictions when you want predictions that are generated based on application input or when you need a fast prediction response.

Endpoint

Before you can get online predictions from your model, you must deploy yourmodel to an endpoint. When you deploy a model to an endpoint, you associate thephysical machine resources with the model so it can serve online predictions.

You can deploy more than one model to one endpoint. You can also deploy onemodel to more than one endpoint. For more information, seeConsiderations for deploying models.

To create anEndpoint resource, you deploy yourmodel. When you call theModel.deploymethod, it creates and returns anEndpoint.

The following is a sample code snippet that shows how to create a customtraining job, create and train a model, and then deploy the model to anendpoint.

# Create your custom training jobjob=aiplatform.CustomTrainingJob(display_name="my_custom_training_job",script_path="task.py",container_uri="us-docker.pkg.dev/vertex-ai/training/tf-cpu.2-8:latest",requirements=["google-cloud-bigquery>=2.20.0","db-dtypes"],model_serving_container_image_uri="us-docker.pkg.dev/vertex-ai/prediction/tf2-cpu.2-8:latest")# Start the training and create your modelmodel=job.run(dataset=dataset,model_display_name="my_model_name",bigquery_destination=f"bq://{project_id}")# Create an endpoint and deploy your model to that endpointendpoint=model.deploy(deployed_model_display_name="my_deployed_model")# Get predictions using test data in a DataFrame named 'df_my_test_data'predictions=endpoint.predict(instances=df_my_test_data)

PrivateEndpoint

A private endpoint is like anEndpoint resource,except predictions are sent across a secure network to the Vertex AIonline prediction service. Use a private endpoint if your organization wants tokeep all traffic private.

To use a private endpoint, you must configure Vertex AI to peer with aVirtual Private Cloud (VPC). A VPC is required for the privateprediction endpoint to connect directly with Vertex AI. For moreinformation, seeSet up VPC network peeringandUse private endpoints for online prediction.

ModelDeploymentMonitoringJob

Use theModelDeploymentMonitoringJobresource to monitor your model and receive alerts if it deviates in a way thatmight impact the quality of your model's predictions.

When the input data deviates from the data used to train your model, the model'sperformance can deteriorate, even if the model hasn't changed. Model monitoringanalyzes input date for featureskew anddrift:

  • Skew occurs when the production feature data distribution deviates from thefeature data used to train the model.
  • Drift occurs when the production feature data changes significantly overtime.

For more information, seeIntroduction to Vertex AI modelmonitoring. For an example of how toimplement Vertex AI monitoring with the Vertex AI SDK, seetheVertex AI model monitoring with explainable AI featureattributionsnotebook on GitHub.

Vector Search prediction classes

Vector Search is a managed service that builds similarityindexes, or vectors, to perform similarity matching. There are two high-levelsteps to perform similarity matching:

  1. Create a vector representation of your data. Data can be text, images, video,audio, or tabular data.

  2. Vector Search uses the endpoints of the vectors you createto perform a high scale, low latency search for similar vectors.

For more information, seeVector Search overviewand theCreate a Vector Search index notebook on GitHub.

MatchingEngineIndex

TheMatchingEngineIndex class representsthe indexes, or vectors, you create that Vector Search uses toperform its similarity search.

There are two search algorithms you can use for your index:

  1. TreeAhConfig uses a shallow the tree-AH algorithm (shallow tree using asymmetric hashing). UseMatchingEngineIndex.create_tree_ah_index to create an index that uses the tree-AH algorithm algorithm.
  2. BruteForceConfig uses a standard linear search) UseMatchingEngineIndex.create_brute_force_index to create an index that uses a standard linear search.

For more information about how you can configure your indexes, seeConfigure indices.

The following code is an example of creating an index that uses the tree-AH algorithm:

my_tree_ah_index=aiplatform.Index.create_tree_ah_index(display_name="my_display_name",contents_delta_uri="gs://my_bucket/embeddings",dimensions=1,approximate_neighbors_count=150,distance_measure_type="SQUARED_L2_DISTANCE",leaf_node_embedding_count=100,leaf_nodes_to_search_percent=50,description="my description",labels={"label_name":"label_value"})

The following code is an example of creating an index that uses the brute force algorithm:

my_brute_force_index=aiplatform.Index.create_brute_force_index(display_name="my_display_name",contents_delta_uri="gs://my_bucket/embeddings",dimensions=1,approximate_neighbors_count=150,distance_measure_type="SQUARED_L2_DISTANCE",description="my description",labels={"label_name":"label_value"})

MatchingEngineIndexEndpoint

Use theMatchingEngineIndexEndpoint classto create and retrieve an endpoint. After you deploy a model to your endpoint,you get an IP address that you use to run your queries.

The following code is an example of creating a matching engine index endpointand then deploying a matching engine index to it:

my_index_endpoint=aiplatform.MatchingEngineIndexEndpoint.create(display_name="sample_index_endpoint",description="index endpoint description",network="projects/123456789123/global/networks/my_vpc")my_index_endpoint=my_index_endpoint.deploy_index(index=my_tree_ah_index,deployed_index_id="my_matching_engine_index_id")

What's next

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2026-02-19 UTC.