- Ray Clusters Overview
- API References
- Cluster...
Cluster Launcher Commands#
This document overviews common commands for using the Ray cluster launcher.See theCluster Configuration docs on how to customize the configuration file.
Launching a cluster (rayup
)#
This will start up the machines in the cloud, install your dependencies and runany setup commands that you have, configure the Ray cluster automatically, andprepare you to scale your distributed system. Seethe documentation forrayup
. The example config files can be accessedhere.
Tip
The worker nodes will start only after the head node has finishedstarting. To monitor the progress of the cluster setup, you can runraymonitor<clusteryaml>
.
# Replace '<your_backend>' with one of: 'aws', 'gcp', 'kubernetes', or 'local'.$BACKEND=<your_backend># Create or update the cluster.$rayupray/python/ray/autoscaler/$BACKEND/example-full.yaml# Tear down the cluster.$raydownray/python/ray/autoscaler/$BACKEND/example-full.yaml
Updating an existing cluster (rayup
)#
If you want to update your cluster configuration (add more files, change dependencies), runrayup
again on the existing cluster.
This command checks if the local configuration differs from the appliedconfiguration of the cluster. This includes any changes to synced filesspecified in thefile_mounts
section of the config. If so, the new filesand config will be uploaded to the cluster. Following that, Rayservices/processes will be restarted.
Tip
Don’t do this for the cloud provider specifications (e.g., change fromAWS to GCP on a running cluster) or change the cluster name (as thiswill just start a new cluster and orphan the original one).
You can also runrayup
to restart a cluster if it seems to be in a badstate (this will restart all Ray services even if there are no config changes).
Runningrayup
on an existing cluster will do all the following:
If the head node matches the cluster specification, the filemounts will bereapplied and the
setup_commands
andraystart
commands will be run.There may be some caching behavior here to skip setup/file mounts.If the head node is out of date from the specified YAML (e.g.,
head_node_type
has changed on the YAML), then the out-of-date node willbe terminated and a new node will be provisioned to replace it. Setup/filemounts/raystart
will be applied.After the head node reaches a consistent state (after
raystart
commandsare finished), the same above procedure will be applied to all the workernodes. Theraystart
commands tend to run araystop
+raystart
,so this will kill currently working jobs.
If you don’t want the update to restart services (e.g. because the changesdon’t require a restart), pass--no-restart
to the update call.
If you want to force re-generation of the config to pick up possible changes inthe cloud environment, pass--no-config-cache
to the update call.
If you want to skip the setup commands and only runraystop
/raystart
on all nodes, pass--restart-only
to the update call.
Seethe documentation forrayup
.
# Reconfigure autoscaling behavior without interrupting running jobs.$rayupray/python/ray/autoscaler/$BACKEND/example-full.yaml\--max-workers=N--no-restart
Running shell commands on the cluster (rayexec
)#
You can userayexec
to conveniently run commands on clusters. Seethe documentation forrayexec
.
# Run a command on the cluster$rayexeccluster.yaml'echo "hello world"'# Run a command on the cluster, starting it if needed$rayexeccluster.yaml'echo "hello world"'--start# Run a command on the cluster, stopping the cluster after it finishes$rayexeccluster.yaml'echo "hello world"'--stop# Run a command on a new cluster called 'experiment-1', stopping it after$rayexeccluster.yaml'echo "hello world"'\--start--stop--cluster-nameexperiment-1# Run a command in a detached tmux session$rayexeccluster.yaml'echo "hello world"'--tmux# Run a command in a screen (experimental)$rayexeccluster.yaml'echo "hello world"'--screen
If you want to run applications on the cluster that are accessible from a webbrowser (e.g., Jupyter notebook), you can use the--port-forward
. The localport opened is the same as the remote port.
$rayexeccluster.yaml--port-forward=8899'source ~/anaconda3/bin/activate tensorflow_p36 && jupyter notebook --port=8899'
Note
For Kubernetes clusters, theport-forward
option cannot be usedwhile executing a command. To port forward and run a command you needto callrayexec
twice separately.
Running Ray scripts on the cluster (raysubmit
)#
You can also useraysubmit
to execute Python scripts on clusters. Thiswillrsync
the designated file onto the head node cluster and execute itwith the given arguments. Seethe documentation forraysubmit
.
# Run a Python script in a detached tmux session$raysubmitcluster.yaml--tmux--start--stoptune_experiment.py# Run a Python script with arguments.# This executes script.py on the head node of the cluster, using# the command: python ~/script.py --arg1 --arg2 --arg3$raysubmitcluster.yamlscript.py----arg1--arg2--arg3
Attaching to a running cluster (rayattach
)#
You can userayattach
to attach to an interactive screen session on thecluster. Seethe documentation forrayattach
orrunrayattach--help
.
# Open a screen on the cluster$rayattachcluster.yaml# Open a screen on a new cluster called 'session-1'$rayattachcluster.yaml--start--cluster-name=session-1# Attach to tmux session on cluster (creates a new one if none available)$rayattachcluster.yaml--tmux
Synchronizing files from the cluster (rayrsync-up/down
)#
To download or upload files to the cluster head node, userayrsync_down
orrayrsync_up
:
$rayrsync_downcluster.yaml'/path/on/cluster''/local/path'$rayrsync_upcluster.yaml'/local/path''/path/on/cluster'
Monitoring cluster status (raydashboard/status
)#
The Ray also comes with an online dashboard. The dashboard is accessible viaHTTP on the head node (by default it listens onlocalhost:8265
). You canalso use the built-inraydashboard
to set up port forwardingautomatically, making the remote dashboard viewable in your local browser atlocalhost:8265
.
$raydashboardcluster.yaml
You can monitor cluster usage and auto-scaling status by running (on the head node):
$raystatus
To see live updates to the status:
$watch-n1raystatus
The Ray autoscaler also reports per-node status in the form of instance tags.In your cloud provider console, you can click on a Node, go to the “Tags” pane,and add theray-node-status
tag as a column. This lets you see per-nodestatuses at a glance:

Common Workflow: Syncing git branches#
A common use case is syncing a particular local git branch to all workers ofthe cluster. However, if you just put agitcheckout<branch>
in the setupcommands, the autoscaler won’t know when to rerun the command to pull inupdates. There is a nice workaround for this by including the git SHA in theinput (the hash of the file will change if the branch is updated):
file_mounts:{"/tmp/current_branch_sha":"/path/to/local/repo/.git/refs/heads/<YOUR_BRANCH_NAME>",}setup_commands:-test -e <REPO_NAME> || git clone https://github.com/<REPO_ORG>/<REPO_NAME>.git-cd <REPO_NAME> && git fetch && git checkout `cat /tmp/current_branch_sha`
This tellsrayup
to sync the current git branch SHA from your personalcomputer to a temporary file on the cluster (assuming you’ve pushed the branchhead already). Then, the setup commands read that file to figure out which SHAthey should checkout on the nodes. Note that each command runs in its ownsession. The final workflow to update the cluster then becomes just this:
Make local changes to a git branch
Commit the changes with
gitcommit
andgitpush
Update files on your Ray cluster with
rayup
- Launching a cluster (
rayup
) - Updating an existing cluster (
rayup
) - Running shell commands on the cluster (
rayexec
) - Running Ray scripts on the cluster (
raysubmit
) - Attaching to a running cluster (
rayattach
) - Synchronizing files from the cluster (
rayrsync-up/down
) - Monitoring cluster status (
raydashboard/status
) - Common Workflow: Syncing git branches