Overview of getting inferences on Vertex AI Stay organized with collections Save and categorize content based on your preferences.
An inference is the output of a trained machine learning model. This pageprovides an overview of the workflow for getting inferences from your models onVertex AI.
Vertex AI offers two methods for getting inferences:
- Online inferences are synchronous requests made to a model that is deployed to an
Endpoint. Therefore, before sending a request, you must first deploy theModelresource to an endpoint. This associatescompute resources with the model so that the model can serve online inferences with low latency. Use online inferences when you are making requests in response to application input or in situations that require timely inference. - Batch inferences are asynchronous requests made to a model that isn't deployed to an endpoint. You send the request (as a
BatchPredictionJobresource) directly to theModelresource. Use batch inferences when you don't require an immediate response and want to process accumulated data by using a single request.
Get inferences from custom trained models
To get inferences, you must firstimport yourmodel. After it's imported, it becomes aModel resource that is visible inVertex AI Model Registry.
Then, read the following documentation to learn how to get inferences:
Get inferences from AutoML models
Unlike custom trained models, AutoML models are automatically imported into theVertex AI Model Registry after training.
Other than that, the workflow for AutoML models is similar, but varies slightlybased on your data type and model objective. The documentation for gettingAutoML inferences is located alongside the other AutoML documentation. Here are linksto the documentation:
Image
Learn how to get inferences from the following types of image AutoML models:
Tabular
Learn how to get inferences from the following types of tabular AutoML models:
Tabular classification and regression models
Tabular forecasting models (batch inferences only)
Get inferences from BigQuery ML models
You can get inferences from BigQuery ML models in two ways:
- Request batch inferences directly from the model inBigQuery ML.
- Register the models directly with theModel Registry, without exporting them fromBigQuery ML or importing them into theModel Registry.
Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2026-02-18 UTC.