Overview of getting inferences on Vertex AI

An inference is the output of a trained machine learning model. This pageprovides an overview of the workflow for getting inferences from your models onVertex AI.

Vertex AI offers two methods for getting inferences:

Online inferences are synchronous requests made to a model that is deployed to anEndpoint. Therefore, before sending a request, you must first deploy theModel resource to an endpoint. This associatescompute resources with the model so that the model can serve online inferences with low latency. Use online inferences when you are making requests in response to application input or in situations that require timely inference.
Batch inferences are asynchronous requests made to a model that isn't deployed to an endpoint. You send the request (as a BatchPredictionJob resource) directly to theModel resource. Use batch inferences when you don't require an immediate response and want to process accumulated data by using a single request.

Get inferences from custom trained models

To get inferences, you must firstimport yourmodel. After it's imported, it becomes aModel resource that is visible inVertex AI Model Registry.

Then, read the following documentation to learn how to get inferences:

Get inferences from AutoML models

Unlike custom trained models, AutoML models are automatically imported into theVertex AI Model Registry after training.

Other than that, the workflow for AutoML models is similar, but varies slightlybased on your data type and model objective. The documentation for gettingAutoML inferences is located alongside the other AutoML documentation. Here are linksto the documentation:

Image

Learn how to get inferences from the following types of image AutoML models:

Tabular

Learn how to get inferences from the following types of tabular AutoML models:

Tabular classification and regression models
- Online inferences
- Batch inferences
Tabular forecasting models (batch inferences only)

Get inferences from BigQuery ML models

You can get inferences from BigQuery ML models in two ways:

Request batch inferences directly from the model inBigQuery ML.
Register the models directly with theModel Registry, without exporting them fromBigQuery ML or importing them into theModel Registry.

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2026-02-18 UTC.

Movatterモバイル変換