Movatterモバイル変換


[0]ホーム

URL:


Skip to main content
Ctrl+K
Try Ray with $100 credit —Start now.

Cluster Launcher Commands#

This document overviews common commands for using the Ray cluster launcher.See theCluster Configuration docs on how to customize the configuration file.

Launching a cluster (rayup)#

This will start up the machines in the cloud, install your dependencies and runany setup commands that you have, configure the Ray cluster automatically, andprepare you to scale your distributed system. Seethe documentation forrayup. The example config files can be accessedhere.

Tip

The worker nodes will start only after the head node has finishedstarting. To monitor the progress of the cluster setup, you can runraymonitor<clusteryaml>.

# Replace '<your_backend>' with one of: 'aws', 'gcp', 'kubernetes', or 'local'.$BACKEND=<your_backend># Create or update the cluster.$rayupray/python/ray/autoscaler/$BACKEND/example-full.yaml# Tear down the cluster.$raydownray/python/ray/autoscaler/$BACKEND/example-full.yaml

Updating an existing cluster (rayup)#

If you want to update your cluster configuration (add more files, change dependencies), runrayup again on the existing cluster.

This command checks if the local configuration differs from the appliedconfiguration of the cluster. This includes any changes to synced filesspecified in thefile_mounts section of the config. If so, the new filesand config will be uploaded to the cluster. Following that, Rayservices/processes will be restarted.

Tip

Don’t do this for the cloud provider specifications (e.g., change fromAWS to GCP on a running cluster) or change the cluster name (as thiswill just start a new cluster and orphan the original one).

You can also runrayup to restart a cluster if it seems to be in a badstate (this will restart all Ray services even if there are no config changes).

Runningrayup on an existing cluster will do all the following:

  • If the head node matches the cluster specification, the filemounts will bereapplied and thesetup_commands andraystart commands will be run.There may be some caching behavior here to skip setup/file mounts.

  • If the head node is out of date from the specified YAML (e.g.,head_node_type has changed on the YAML), then the out-of-date node willbe terminated and a new node will be provisioned to replace it. Setup/filemounts/raystart will be applied.

  • After the head node reaches a consistent state (afterraystart commandsare finished), the same above procedure will be applied to all the workernodes. Theraystart commands tend to run araystop +raystart,so this will kill currently working jobs.

If you don’t want the update to restart services (e.g. because the changesdon’t require a restart), pass--no-restart to the update call.

If you want to force re-generation of the config to pick up possible changes inthe cloud environment, pass--no-config-cache to the update call.

If you want to skip the setup commands and only runraystop/raystarton all nodes, pass--restart-only to the update call.

Seethe documentation forrayup.

# Reconfigure autoscaling behavior without interrupting running jobs.$rayupray/python/ray/autoscaler/$BACKEND/example-full.yaml\--max-workers=N--no-restart

Running shell commands on the cluster (rayexec)#

You can userayexec to conveniently run commands on clusters. Seethe documentation forrayexec.

# Run a command on the cluster$rayexeccluster.yaml'echo "hello world"'# Run a command on the cluster, starting it if needed$rayexeccluster.yaml'echo "hello world"'--start# Run a command on the cluster, stopping the cluster after it finishes$rayexeccluster.yaml'echo "hello world"'--stop# Run a command on a new cluster called 'experiment-1', stopping it after$rayexeccluster.yaml'echo "hello world"'\--start--stop--cluster-nameexperiment-1# Run a command in a detached tmux session$rayexeccluster.yaml'echo "hello world"'--tmux# Run a command in a screen (experimental)$rayexeccluster.yaml'echo "hello world"'--screen

If you want to run applications on the cluster that are accessible from a webbrowser (e.g., Jupyter notebook), you can use the--port-forward. The localport opened is the same as the remote port.

$rayexeccluster.yaml--port-forward=8899'source ~/anaconda3/bin/activate tensorflow_p36 && jupyter notebook --port=8899'

Note

For Kubernetes clusters, theport-forward option cannot be usedwhile executing a command. To port forward and run a command you needto callrayexec twice separately.

Running Ray scripts on the cluster (raysubmit)#

You can also useraysubmit to execute Python scripts on clusters. Thiswillrsync the designated file onto the head node cluster and execute itwith the given arguments. Seethe documentation forraysubmit.

# Run a Python script in a detached tmux session$raysubmitcluster.yaml--tmux--start--stoptune_experiment.py# Run a Python script with arguments.# This executes script.py on the head node of the cluster, using# the command: python ~/script.py --arg1 --arg2 --arg3$raysubmitcluster.yamlscript.py----arg1--arg2--arg3

Attaching to a running cluster (rayattach)#

You can userayattach to attach to an interactive screen session on thecluster. Seethe documentation forrayattach orrunrayattach--help.

# Open a screen on the cluster$rayattachcluster.yaml# Open a screen on a new cluster called 'session-1'$rayattachcluster.yaml--start--cluster-name=session-1# Attach to tmux session on cluster (creates a new one if none available)$rayattachcluster.yaml--tmux

Synchronizing files from the cluster (rayrsync-up/down)#

To download or upload files to the cluster head node, userayrsync_down orrayrsync_up:

$rayrsync_downcluster.yaml'/path/on/cluster''/local/path'$rayrsync_upcluster.yaml'/local/path''/path/on/cluster'

Monitoring cluster status (raydashboard/status)#

The Ray also comes with an online dashboard. The dashboard is accessible viaHTTP on the head node (by default it listens onlocalhost:8265). You canalso use the built-inraydashboard to set up port forwardingautomatically, making the remote dashboard viewable in your local browser atlocalhost:8265.

$raydashboardcluster.yaml

You can monitor cluster usage and auto-scaling status by running (on the head node):

$raystatus

To see live updates to the status:

$watch-n1raystatus

The Ray autoscaler also reports per-node status in the form of instance tags.In your cloud provider console, you can click on a Node, go to the “Tags” pane,and add theray-node-status tag as a column. This lets you see per-nodestatuses at a glance:

../../../_images/autoscaler-status.png

Common Workflow: Syncing git branches#

A common use case is syncing a particular local git branch to all workers ofthe cluster. However, if you just put agitcheckout<branch> in the setupcommands, the autoscaler won’t know when to rerun the command to pull inupdates. There is a nice workaround for this by including the git SHA in theinput (the hash of the file will change if the branch is updated):

file_mounts:{"/tmp/current_branch_sha":"/path/to/local/repo/.git/refs/heads/<YOUR_BRANCH_NAME>",}setup_commands:-test -e <REPO_NAME> || git clone https://github.com/<REPO_ORG>/<REPO_NAME>.git-cd <REPO_NAME> && git fetch && git checkout `cat /tmp/current_branch_sha`

This tellsrayup to sync the current git branch SHA from your personalcomputer to a temporary file on the cluster (assuming you’ve pushed the branchhead already). Then, the setup commands read that file to figure out which SHAthey should checkout on the nodes. Note that each command runs in its ownsession. The final workflow to update the cluster then becomes just this:

  1. Make local changes to a git branch

  2. Commit the changes withgitcommit andgitpush

  3. Update files on your Ray cluster withrayup


[8]ページ先頭

©2009-2025 Movatter.jp