Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

[Autoscaler][Placement Group] Skip placed bundle when requesting resource#48924

Open
mimiliaogo wants to merge7 commits intoray-project:master
base:master
Choose a base branch
Loading
frommimiliaogo:pg-overprovision

Conversation

mimiliaogo
Copy link
Contributor

@mimiliaogomimiliaogo commentedNov 25, 2024
edited
Loading

Why are these changes needed?

Before the PR, when a node in a placement group (PG) goes down, the autoscaler attempts to reschedule the entire PG (all bundles). However, this will lead to overprovisioning. Details:#40212

This PR solved this by skipping already placed bundles (i.e., bundles with an associated node_id) when demanding resources in autoscaler.

Before: Every bundles get rescheduled

image

After: Only one node will be scaled up

Screenshot 2024-11-25 at 12 37 39 PM

Related issue number

Closes#40212

Checks

  • I've signed off every commit(by using the -s flag, i.e.,git commit -s) in this PR.
  • I've runscripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed forhttps://docs.ray.io/en/master/.
    • I've added any new APIs to the API Reference. For example, if I added a
      method in Tune, I've added it indoc/source/tune/api/ under the
      corresponding.rst file.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures athttps://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

PhilippWillms reacted with thumbs up emoji
@jcotant1jcotant1 added the coreIssues that should be addressed in Ray Core labelNov 25, 2024
@kevin85421kevin85421 self-assigned thisNov 26, 2024
Copy link
Member

@kevin85421kevin85421 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

add a test

shapes = [dict(bundle.unit_resources) for bundle in placement_group.bundles]
# Skip **placed** bundle (which has node id associated with it).
for bundle in placement_group.bundles:
if bundle.node_id:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Is it an empty string orNone? If it isNone, useis instead.

Suggested change
ifbundle.node_id:
ifbundle.node_idisnotNone:

Copy link
ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

fix in7a44207, it should be an empty byte string.

break

# TODO(mimi): kill_raylet won't trigger reschedule in autoscaler v1
# kill_raylet(node["NodeManagerAddress"], node["NodeManagerPort"])
Copy link
ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I found that when usingkill_raylet, reschedule won't be triggered in autoscaler v1, even when the cluster status shows the node is killed. In this case, v1 will fail and v2 pass.
Both v1 and v2 pass when usingkill_node.

kevin85421 reacted with thumbs up emoji

from ray.autoscaler.v2.sdk import get_cluster_status

def verify_nodes(active=3, idle=1):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Suggested change
defverify_nodes(active=3,idle=1):
defverify_nodes(active,idle):

mimiliaogo reacted with thumbs up emoji

def kill_node(node_id):
# kill -9
import subprocess
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Moveimport to top-level. Typically, Ray uses deferred import only to avoid circular dependencies.

mimiliaogo reacted with thumbs up emoji
wait_for_condition(lambda: verify_nodes(3, 1))

# Kill a node
def kill_raylet(ip, port, graceful=True):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Remove this function because it is not used for now.

mimiliaogo reacted with thumbs up emoji
# Wait for the node to be removed
wait_for_condition(lambda: verify_nodes(2, 1), 20)

# Check that the placement group is rescheduled
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

where is the logic to check the placement group is rescheduled?

Copy link
ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

wait_for_condition(lambda: verify_nodes(3, 1)) is to check the autoscaler rescheduling. However, this comment is redundant, I've already removed it.


ray.get(pg.ready())

from ray.autoscaler.v2.sdk import get_cluster_status
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Do we need to import this? It seems to have already been imported at the top level.

mimiliaogo reacted with thumbs up emoji
Copy link
ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Above suggestions fixed in415dcf8

@kevin85421kevin85421 added the goadd ONLY when ready to merge, run all tests labelDec 17, 2024
@kevin85421
Copy link
Member

CI fails. Can you fix the CI errors?

Signed-off-by: Mimi Liao <mimiliao2000@gmail.com>
Signed-off-by: Mimi Liao <mimiliao2000@gmail.com>
Signed-off-by: Mimi Liao <mimiliao2000@gmail.com>
Signed-off-by: Mimi Liao <mimiliao2000@gmail.com>
Signed-off-by: Mimi Liao <mimiliao2000@gmail.com>
@@ -986,7 +986,13 @@ def placement_groups_to_resource_demands(
resource_demand_vector = []
unconverted = []
for placement_group in pending_placement_groups:
shapes = [dict(bundle.unit_resources) for bundle in placement_group.bundles]
# Skip **placed** bundle (which has node id associated with it).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Is this behavior correct withSTRICT_PACK? If already placed bundles are removed, will the new bundles be placed on different nodes?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Yes, the behavior is still correct withSTRICT_PACK because we only calculate shapes here, and shapes here do not directly instruct scheduling. The scheduler will not schedule aSTRICT_PACK group across different nodes.

Back to the duty of thisplacement_groups_to_resource_demands function, it is correct to present the remaining demand of aSTRICT_PACK group to the laterget_bin_pack_residual. Say we have 3 bundles in aSTRICT_PACK but 2 of the bundles have node_id on them, we still need to present the remaining 1 bundle to laterget_bin_pack_residual and thenget_bin_pack_residual should consume it.

@kevin85421
Copy link
Member

cc@rueian would you mind taking this PR for another pass?

@rueian
Copy link
Contributor

rueian commentedApr 1, 2025
edited
Loading

cc@rueian would you mind taking this PR for another pass?

The changes still look good to me, but in my personal experience, we will be asked for a unit test intest_resource_demand_scheduler.py. cc@mimiliaogo.

@hainesmichaelchainesmichaelc added the community-contributionContributed by the community labelApr 4, 2025
Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment
Reviewers

@chmeyerschmeyerschmeyers left review comments

@rueianrueianrueian approved these changes

@kevin85421kevin85421kevin85421 approved these changes

@hongchaodenghongchaodengAwaiting requested review from hongchaodeng

Assignees

@kevin85421kevin85421

Labels
community-contributionContributed by the communitycoreIssues that should be addressed in Ray Coregoadd ONLY when ready to merge, run all tests
Projects
None yet
Milestone
No milestone
Development

Successfully merging this pull request may close these issues.

[core][autoscaler][v1] Autoscaler overprovisions nodes when strict placement group is rescheduling
6 participants
@mimiliaogo@kevin85421@rueian@chmeyers@hainesmichaelc@jcotant1

[8]ページ先頭

©2009-2025 Movatter.jp