Cloud Monitoring metrics for Vertex AI

Vertex AI exports metrics toCloud Monitoring.Vertex AI also shows some of these metrics in theVertex AI Google Cloud console. You can use Cloud Monitoringto create dashboards or configure alerts based on the metrics. For example, youcan receive alerts if a model's prediction latency in Vertex AIgets too high.

The following sections describe the metrics provided in theVertex AI Google Cloud console, which might be direct orcalculated metrics that Vertex AI sends to Cloud Monitoring.

To view a list of most metrics that Vertex AI exports toCloud Monitoring, seeaiplatform. For customtrainingmetrics, see metric types that start withtraining in theml section.

Custom training monitoring metrics

When youperform custom training, youcan monitor the following types of resource usage for each training node:

  • CPU or GPU utilization of each training node
  • Memory utilization of each training node
  • Network usage (bytes sent per second and bytes received per second)

If you are usinghyperparameter tuning, you cansee the metrics for each trial.

To view these metrics after you have initiated custom training, do thefollowing:

  1. In the Google Cloud console, go to one of the following pages, depending onwhether you are using hyperparameter tuning:

  2. Click the name of your custom training resource.

    If you created acustomTrainingPipeline resource, then click the nameof the job created by theTrainingPipeline; for example,TRAINING_PIPELINE_NAME-custom-job orTRAINING_PIPELINE_NAME-hyperparameter-tuning-job.

  3. Click theCPU,GPU, orNetwork tab to view utilization charts forthe metric that you are interested in.

    If you are using hyperparameter tuning, you can click a row in theHyperparamater tuning trials table to view metrics for a specific trial.

To see older metrics or to customize how you view metrics, useMonitoring. Vertex AI exports custom trainingmetrics to Monitoring asmetric types with the prefixml.googleapis.com/training. The monitoredresource type iscloudml_job.

Note thatAI Platform Training exports metrics toMonitoring with the same metric types and resource type.

Endpoint monitoring metrics

After you deploy a model to anendpoint,you can monitor the endpoint to understand your model's performance and resourceusage. You can track metrics such as traffic patterns, error rates, latency, andresource utilization to ensure that your model consistently and predictablyresponds to requests. For example, you might redeploy your model with adifferent machine type to optimize for cost. After you make the change, you canmonitor the model to check whether your changes adversely affected itsperformance.

In Cloud Monitoring, the monitored resource type for deployed models isaiplatform.googleapis.com/Endpoint.

Performance metrics

Performance metrics can help you find information about your model's trafficpatterns, errors, and latency. You can view the following performance metrics inthe Google Cloud console.

  • Predictions per second: The number of predictions per second across bothonline and batch predictions. If you have more than one instance per request,each instance is counted in this chart.
  • Prediction error percentage: The rate of errors that your model isproducing. A high error rate might indicate an issue with the model or withthe requests to the model. View the response codes chart to determine whicherrors are occurring.
  • Model latency (for tabular and custom models only): The time spentperforming computation.
  • Overhead latency (for tabular and custom models only): The total timespent processing a request, outside of computation.
  • Total latency duration: The total time that a request spends in theservice, which is the model latency plus the overhead latency.

Resource usage

Resource usage metrics can help you track your model's CPU usage, memory usage,and network usage. You can view the following usage metrics in theGoogle Cloud console.

  • Replica count: The number of active replicas used by the deployed model.
  • Replica target: The number of active replicas required for the deployedmodel.
  • CPU usage: Current CPU core usage rate of the deployed model replica.100% represents one fully utilized CPU core, so a replica may achieve morethan 100% utilization if its machine type has multiple cores.
  • Memory usage: The amount of memory allocated by the deployed model replicaand currently in use.
  • Network bytes sent: The number of bytes sent over the network by thedeployed model replica.
  • Network bytes received: The number of bytes received over the network bythe deployed model replica.
  • Accelerator average duty cycle: The average fraction of time over the pastsample period during which one or more accelerators were actively processing.
  • Accelerator memory usage: The amount of memory allocated by the deployedmodel replica.

View endpoint monitoring metric charts

  1. Go to the Vertex AIEndpoints page in theGoogle Cloud console.

    Go to the Endpoints page

  2. Click the name of an endpoint to view its metrics.

  3. Below the chart intervals, clickPerformance orResource usage toview the performance or resource usage metrics.

    You can select different chart intervals to see metric values over aparticular time period, such as 1 hour, 12 hours, or 14 days.

    If you have multiple models deployed to the endpoint, you can select ordeselect models to view or hide metrics for particular models. If you selectmultiple models, the console groups some model metrics into a single chart.For example, if a metric provides only one value per model, the consolegroups the model metrics into a single chart, such as CPU usage. For metricsthat can have multiple values per model, the console provides a chart foreach model. For example, the console provides a response code chart for eachmodel.

Vertex AI Feature Store (Legacy) monitoring metrics

After youbuild a feature store usingVertex AI Feature Store (Legacy), you can monitorits performance and resource utilization, such as the online storage serving latencies or thenumber of online storage nodes. For example, you might want to monitorthe changes to the online storage serving metrics after updating the number ofonline storage nodes of a featurestore.

In Cloud Monitoring, the monitored resource type for a featurestore isaiplatform.googleapis.com/Featurestore.

Metrics

  • Request size: The request size by entity type in your featurestore.
  • Offline storage write for streaming write: The number of streaming write requests processed for the offline storage.
  • Streaming write to offline storage delay time: The time elapsed (in seconds) between calling the write API and writing to the offline storage.
  • Node count: The number of online serving nodes for your featurestore.
  • Latency: The total time that an online serving or streaming ingestionrequest spends in the service.
  • Queries per second: The number of online serving or streaming ingestionqueries that your featurestore handles.
  • Errors percentage: The percentage of errors that your featurestoreproduces when handling online serving or streaming ingestion requests.
  • CPU utilization: The fraction of CPU allocated by the featurestore that'sbeing utilized by the online storage. This number can exceed 100% if the onlineserving storage is overloaded. Consider increasing the featurestore's numberof online serving nodes to reduce CPU utilization.
  • CPU utilization - hottest node: The CPU load for the hottest node in thefeaturestore's online storage.
  • Total offline storage: Amount of data stored in the featurestore'soffline storage.
  • Total online storage: Amount of data stored in the featurestore'sonlinestorage.
  • Online serving throughput: In MBps, the throughput for online servingrequests.

View featurestore monitoring metric charts

  1. Go to the Vertex AIFeatures page in theGoogle Cloud console.

    Go to the Features page

  2. In theFeaturestore column, click the name of a featurestore to view itsmetrics.

    You can select different chart intervals to see metric values over aparticular time period, such as 1 hour, 1 day, or 1 week.

    For some online serving metrics, you can choose to view metrics for aparticular method, which further breaks down metrics by entity type. Forexample, you can view the latency for theReadFeatureValues method or theStreamingReadFeatureValues method.

Vertex AI Feature Store monitoring metrics

After youset up online serving using Vertex AI Feature Store,you can monitor its performance and resource utilization. For example, youcan monitor the CPU loads, the number of nodes for Optimized online serving,and the number of serving requests.

In Cloud Monitoring, the monitored resource type for an online store instanceisaiplatform.googleapis.com/FeatureOnlineStore.

Metrics

  • Bytes stored: The amount of data in bytes in the online store instance.

  • CPU load: The average CPU load of nodes in the online store instance.

  • CPU load (hottest node): The CPU load of the hottest node in the onlinestore instance.

  • Node count: The number of online serving nodes for an online store instanceconfigured for Bigtable online serving.

  • Optimized node count: The number of online serving nodes for an online store instanceconfigured for Optimized online serving.

  • Request count: The number of requests received by the online store instance.

  • Request latency: The server-side request latency of the online store instance.

  • Response byte count: The amount of data in bytes sent in online serving responses.

  • Serving data ages: The serving data age in seconds, measured as thedifference between the current time and the time of the last sync.

  • Running syncs: The number of running syncs at a given point of time.

  • Serving data by synced time: Breakdown of data in the online storeinstance by the synced timestamp.

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2025-12-15 UTC.