- Notifications
You must be signed in to change notification settings - Fork6.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
Adds new working dir upload protocol PLASMA, and use it in job submission.#45880
base:master
Are you sure you want to change the base?
Conversation
…sions.Signed-off-by: Ruiyang Wang <rywang014@gmail.com>
Hmm if we cannot make it work in all cases (e.g., ray.init()), I feel like it may be better just allowing http interface (or allowing s3). But I feel like there may be ways to make it work with ray.init() because only workers are going to need runtime env |
This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 14 days if no further activity occurs. Thank you for your contributions.
|
When user inits Ray with a working_dir, under the hood we package the user-local directory/zip and upload to the Ray cluster. However we are uploading it to the GCS Internal KV, which may pose unnecessary burden to our global process, who can already be very busy on large clusters.
This PR introduces a new "remote protocol"
plasma
. It spins up a global singleton detached actorDataHolder
, which stores bytes into the Object Store. The package uploader invokes a Ray remote method to store it; the package downloader just do regular ray.get to download.Problem: this requires
ray
be initialized in the first place. So when you doray.init(runtime_env={"working_dir":"./"}
you introduce a circular dependency between ray and DataHolder, which fails. For similar reasons, Ray Client can hardly do this.Fortunately, Jobs can do that just fine. When you do
ray job submit
, it actually makes a HTTP PUT call to the dashboard JobAgent, which invokes anything needed to save the package. And there, DataHolder can work.In this PR:
What's not changed:
ray.init()
still uses GCS.In the long run: we can extend Ray driver script & Ray client cases to all use HTTP PUT, and remove GCS code path.