Monitor your Ray cluster on Vertex AI

This page covers how to view the tracking logs associated with yourRay clusters and monitor the Ray on Vertex AI metrics. Guidancefor debugging Ray clusters is also provided.

View logs

When you perform tasks with your Ray cluster on Vertex AI,tracking logs are automatically generated and stored in both Cloud Loggingandopen source Ray dashboard. This section describes how to accessthe generated logs through the Google Cloud console.

Before you begin, make sure to read theRay on Vertex AI overview andset up all the prerequisite tools youneed.

Ray OSS dashboard

You can view the open source Ray log files through the Ray OSS dashboard:

  1. In the Google Cloud console, go to the Ray on Vertex AI page.

    Go to the Ray on Vertex AI page

  2. In the row for the cluster you created, selectmore actions menu.

  3. Select the Ray OSS dashboard link.The dashboard opens in another tab.

  4. Navigate to theLogs view in the top right corner in the menu:

    Ray dashboard logs

  5. Click each node to see the log files associated with that node.

Cloud Logging console

  1. In the Google Cloud console, go to theLogs Explorer page:

    Go toLogs Explorer

    If you use the search bar to find this page, then select the result whose subheading isLogging.

  2. Select an existing Google Cloud project, folder, or organization.

  3. To display all Ray logs, enter the following query into the query-editorfield, and then clickRun query:

    resource.labels.task_name="ray-cluster-logs"
  4. To narrow down the logs to a specific Ray cluster, add the following lineto the query and then clickRun query:

    labels."ml.googleapis.com/ray_cluster_id"=CLUSTER_NAME

    ReplaceCLUSTER_NAME with the name for your Ray cluster. In the Google Cloud console go toVertex AI >Ray on Vertex AI where you see a list of cluster names in each region.

  5. To further narrow down the logs to a specific log file likeraylet.out,click the name of the log underLog fields ->Log name.

  6. You can group similar log entries together:

    1. In theQuery results, click a log entry to expand the log.

    2. In thejsonPayload, click thetailed_path value. A drop-down menuappears.

    3. ClickShow matching entries.

Disable logs

By default, Ray on Vertex AI Cloud Logging is enabled.

  • To disable the export of Ray logs to Cloud Logging, use the followingVertex AI SDK for Python command:

    vertex_ray.create_ray_cluster(...,enable_logging=False,...)

You can view the Ray log files on the Ray dashboard even if theRay on Vertex AI Cloud Logging feature is disabled.

Monitor metrics

You can view the Ray on Vertex AI metrics in different ways usingGoogle Cloud Monitoring (GCM).Alternatively, you can export the metrics from GCM to your own Grafana server.

Note: SeeGoogle Cloud Managed Service for Prometheus (GMP) forpricing anddata storage information.

Monitor Metrics in GCM

There are two ways you can view the Ray on Vertex AI metrics in GCM.

  • Use the direct view underMetrics Explorer.
  • Import the Grafana dashboard.

Metrics Explorer

To use the direct view underMetrics Explorer, follow these steps:

  1. Go to the Google Cloud Monitoring console.
  2. UnderExplore selectMetrics explorer.
  3. UnderActive Resources, selectPrometheus Target.Active Metric Categories appears.
  4. SelectRay.

    A list of metrics appears:

    select metric
  5. Select the metrics you want to monitor. For example:
    1. Choose the cpu utilization percentage as a monitored metric:
      utilization-target
    2. Select a filter. For example, select cluster:
      add necessary filter Use the cluster ID to only monitor the above metrics for a specific cluster. To locate your cluster ID, follow these steps:
      1. In the Google Cloud console, go to theRay page.

        Go to Ray

      2. Be sure you're in the project you want to create the experiment in.
        Vertex AI select project
      3. UnderName a list of cluster IDs appears.
      select metric
    3. Select theAggregation method to view the metrics. That is, you can choose to view unaggregated metrics, which show each Ray process's CPU utilization:
      unaggregated metrics

GCM dashboard

To import a Grafana dashboard for Ray on Vertex AI follow the guidelines on thecloud monitoring dashboard,Import your own grafana dashboard.

monitoring dashboard

All you need is a Grafana dashboard JSON file. OSS Ray supports thismanual setup by providing the default dashboard Grafana JSON file.

Monitor metrics

from user-owned Grafana

If you already have a Grafana server running, then there's also a way to exportall the Ray cluster on Vertex AI Prometheus metrics to your existingGrafana server. To do so, follow the GMPQuery using Grafanaguidance. This lets you add a new Grafana data source to your existing Grafanaserver and use the data source syncer to sync the new Grafana Prometheus datasource to Ray on Vertex AI metrics.

It's important that you configure and authenticate the newly added Grafanadata source using the data source syncer. Follow the steps provided inConfigure and authenticate the Grafana data source.

Once synced, you can create and add any dashboard you need based on theRay on Vertex AI metrics.

By default, the Ray on Vertex AI metrics collections are enabled.Here's how to disable them using Vertex AI SDK for Python:

vertex_ray.create_ray_cluster(...,enable_metrics_collection=False,...)

Debug Ray clusters

To debug Ray clusters, use theHead node interactive shell:

Note: Only use the interactive shell for debugging purposes or other advancedoperations not supported in other ways. It'snot recommended for normaloperations like running workloads.

Google Cloud console

To access theHead node interactive shell, do the following:

  1. In the Google Cloud console, go to theRay on Vertex AI page.
    Go to Ray on Vertex AI
  2. Be sure you're in the correct project.
    Vertex AI select project
  3. Select the cluster you want to examine.Basic info section appears.
  4. In theAccess links section, click the link forHead node interactive shell. The head node interactive shell appears.
  5. Follow the instructions outlined inMonitor and debug training with an interactive shell.

What's next

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2026-02-18 UTC.