Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

[RFC][dashboard] Use aiohttp client for inter dependencies.#49932

Open
rynewang wants to merge1 commit intoray-project:master
base:master
Choose a base branch
Loading
fromrynewang:train-http-get-actors-rfcx

Conversation

rynewang
Copy link
Contributor

@rynewangrynewang commentedJan 18, 2025
edited
Loading

This PR is a request for comment on a design of using HTTP requests for inter-Head dependencies. It removes TrainHead usage of DataSource.

Background

TrainHead usesDataOrganizer.get_actor_infos to get facts about Actors. This can't be easily reduced to simple singular GcsClient calls, because it comes from a merge of Actor infos and Worker infos (e.g.actor["gpus"][0][processesPids"] are fromDataSource.node_physical_stats that roots to GCSGetAllResourceUsage rpc.

Proposal

Let the TrainHead depend on ActorHead by directly calling HTTP requests. The overhead should be small since they are guaranteed to live in a same node.

Scope?

We will do direct read to GCS as much as possible. For cases like this, where it's not trivial to adapt, and frequency is low, and (maybe) non critical, we can use http client.

Changes

  • In our cache middleware, add support forCache-Control: no-cache.
  • In ActorHead, add optional paramactor_ids to APIGET /logical/actors
  • In TrainHead, call that API with no-cache

Alternative

Now, ResourceUsage are subscribed by ReportHead and written toDataSource.node_physical_stats (moving to NodeHead in#49878). To do "true isolation" we will need to define a way to get snapshot info of ResourceUsage for a certain Node, which can be a bigger amount of change.

Signed-off-by: Ruiyang Wang <rywang014@gmail.com>
@staleStale
Copy link

stalebot commentedFeb 24, 2025

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 14 days if no further activity occurs. Thank you for your contributions.

  • If you'd like to keep this open, just leave any comment, and the stale label will be removed.

@stalestalebot added the staleThe issue is stale. It will be closed within 7 days unless there are further conversation labelFeb 24, 2025
Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment
Reviewers
No reviews
Assignees
No one assigned
Labels
staleThe issue is stale. It will be closed within 7 days unless there are further conversation
Projects
None yet
Milestone
No milestone
Development

Successfully merging this pull request may close these issues.

1 participant
@rynewang

[8]ページ先頭

©2009-2025 Movatter.jp