Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

[RFC][dashboard] Use aiohttp client for inter dependencies.#49932

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Conversation

rynewang
Copy link
Contributor

@rynewangrynewang commentedJan 18, 2025
edited
Loading

This PR is a request for comment on a design of using HTTP requests for inter-Head dependencies. It removes TrainHead usage of DataSource.

Background

TrainHead usesDataOrganizer.get_actor_infos to get facts about Actors. This can't be easily reduced to simple singular GcsClient calls, because it comes from a merge of Actor infos and Worker infos (e.g.actor["gpus"][0][processesPids"] are fromDataSource.node_physical_stats that roots to GCSGetAllResourceUsage rpc.

Proposal

Let the TrainHead depend on ActorHead by directly calling HTTP requests. The overhead should be small since they are guaranteed to live in a same node.

Scope?

We will do direct read to GCS as much as possible. For cases like this, where it's not trivial to adapt, and frequency is low, and (maybe) non critical, we can use http client.

Changes

  • In our cache middleware, add support forCache-Control: no-cache.
  • In ActorHead, add optional paramactor_ids to APIGET /logical/actors
  • In TrainHead, call that API with no-cache

Alternative

Now, ResourceUsage are subscribed by ReportHead and written toDataSource.node_physical_stats (moving to NodeHead in#49878). To do "true isolation" we will need to define a way to get snapshot info of ResourceUsage for a certain Node, which can be a bigger amount of change.

Signed-off-by: Ruiyang Wang <rywang014@gmail.com>
@staleStale
Copy link

stalebot commentedFeb 24, 2025

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 14 days if no further activity occurs. Thank you for your contributions.

  • If you'd like to keep this open, just leave any comment, and the stale label will be removed.

@stalestalebot added the staleThe issue is stale. It will be closed within 7 days unless there are further conversation labelFeb 24, 2025
@staleStale
Copy link

stalebot commentedApr 25, 2025

Hi again! The issue will be closed because there has been no more activity in the 14 days since the last message.

Please feel free to reopen or open a new issue if you'd still like it to be addressed.

Again, you can always ask for help on ourdiscussion forum orRay's public slack channel.

Thanks again for opening the issue!

Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment
Reviewers
No reviews
Assignees
No one assigned
Labels
community-backlogstaleThe issue is stale. It will be closed within 7 days unless there are further conversation
Projects
None yet
Milestone
No milestone
Development

Successfully merging this pull request may close these issues.

2 participants
@rynewang@hainesmichaelc

[8]ページ先頭

©2009-2025 Movatter.jp