- Notifications
You must be signed in to change notification settings - Fork6.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
[RFC][dashboard] Use aiohttp client for inter dependencies.#49932
Open
rynewang wants to merge1 commit intoray-project:masterChoose a base branch fromrynewang:train-http-get-actors-rfcx
base:master
Could not load branches
Branch not found:{{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline, and old review comments may become outdated.
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters
Signed-off-by: Ruiyang Wang <rywang014@gmail.com>
This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 14 days if no further activity occurs. Thank you for your contributions.
|
Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment
Labels
staleThe issue is stale. It will be closed within 7 days unless there are further conversation
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR is a request for comment on a design of using HTTP requests for inter-Head dependencies. It removes TrainHead usage of DataSource.
Background
TrainHead uses
DataOrganizer.get_actor_infos
to get facts about Actors. This can't be easily reduced to simple singular GcsClient calls, because it comes from a merge of Actor infos and Worker infos (e.g.actor["gpus"][0][processesPids"]
are fromDataSource.node_physical_stats
that roots to GCSGetAllResourceUsage
rpc.Proposal
Let the TrainHead depend on ActorHead by directly calling HTTP requests. The overhead should be small since they are guaranteed to live in a same node.
Scope?
We will do direct read to GCS as much as possible. For cases like this, where it's not trivial to adapt, and frequency is low, and (maybe) non critical, we can use http client.
Changes
Cache-Control: no-cache
.actor_ids
to APIGET /logical/actors
Alternative
Now, ResourceUsage are subscribed by ReportHead and written to
DataSource.node_physical_stats
(moving to NodeHead in#49878). To do "true isolation" we will need to define a way to get snapshot info of ResourceUsage for a certain Node, which can be a bigger amount of change.