Before the PR, when a node in a placement group (PG) goes down, the autoscaler attempts to reschedule the entire PG (all bundles). However, this will lead to overprovisioning. Details:#40212

This PR solved this by skipping already placed bundles (i.e., bundles with an associated node_id) when demanding resources in autoscaler.

Before: Every bundles get rescheduled

After: Only one node will be scaled up

Related issue number

Closes#40212

Checks

I've signed off every commit(by using the -s flag, i.e.,git commit -s) in this PR.
I've runscripts/format.sh to lint the changes in this PR.
I've included any doc changes needed forhttps://docs.ray.io/en/master/.
- I've added any new APIs to the API Reference. For example, if I added a
  method in Tune, I've added it indoc/source/tune/api/ under the
  corresponding.rst file.
I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures athttps://flakey-tests.ray.io/
Testing Strategy
- Unit tests
- Release tests
- This PR is not tested :(

mimiliaogo requested review fromhongchaodeng anda team ascode owners

November 25, 2024 18:57

jcotant1 added the coreIssues that should be addressed in Ray Core label

Nov 25, 2024

kevin85421 self-assigned this

Nov 26, 2024

kevin85421 reviewed

Dec 4, 2024

View reviewed changes

Copy link

Member

kevin85421 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

add a test

python/ray/autoscaler/_private/resource_demand_scheduler.py Outdated

		shapes = [dict(bundle.unit_resources) for bundle in placement_group.bundles]
		# Skip placed bundle (which has node id associated with it).
		for bundle in placement_group.bundles:
		if bundle.node_id:

Copy link

Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Is it an empty string orNone? If it isNone, useis instead.

Suggested change

	ifbundle.node_id:
	ifbundle.node_idisnotNone:

Copy link

ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

fix in7a44207, it should be an empty byte string.

mimiliaogo force-pushed thepg-overprovision branch from5818b44 to6353847Compare

December 13, 2024 02:27

mimiliaogo commented

Dec 13, 2024

View reviewed changes

python/ray/autoscaler/v2/tests/test_e2e.py Outdated

		break

		# TODO(mimi): kill_raylet won't trigger reschedule in autoscaler v1
		# kill_raylet(node["NodeManagerAddress"], node["NodeManagerPort"])

Copy link

ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I found that when usingkill_raylet, reschedule won't be triggered in autoscaler v1, even when the cluster status shows the node is killed. In this case, v1 will fail and v2 pass.
Both v1 and v2 pass when usingkill_node.

mimiliaogo requested a review fromkevin85421

December 13, 2024 02:48

kevin85421 reviewed

Dec 16, 2024

View reviewed changes

python/ray/autoscaler/v2/tests/test_e2e.py Outdated


		from ray.autoscaler.v2.sdk import get_cluster_status

		def verify_nodes(active=3, idle=1):

Copy link

Member

kevin85421Dec 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Suggested change

	defverify_nodes(active=3,idle=1):
	defverify_nodes(active,idle):

python/ray/autoscaler/v2/tests/test_e2e.py Outdated


		def kill_node(node_id):
		# kill -9
		import subprocess

Copy link

Member

kevin85421Dec 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Moveimport to top-level. Typically, Ray uses deferred import only to avoid circular dependencies.

python/ray/autoscaler/v2/tests/test_e2e.py Outdated

		wait_for_condition(lambda: verify_nodes(3, 1))

		# Kill a node
		def kill_raylet(ip, port, graceful=True):

Copy link

Member

kevin85421Dec 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Remove this function because it is not used for now.

python/ray/autoscaler/v2/tests/test_e2e.py Outdated

		# Wait for the node to be removed
		wait_for_condition(lambda: verify_nodes(2, 1), 20)

		# Check that the placement group is rescheduled

Copy link

Member

kevin85421Dec 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

where is the logic to check the placement group is rescheduled?

Copy link

ContributorAuthor

mimiliaogoDec 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

wait_for_condition(lambda: verify_nodes(3, 1)) is to check the autoscaler rescheduling. However, this comment is redundant, I've already removed it.

python/ray/autoscaler/v2/tests/test_e2e.py Outdated


		ray.get(pg.ready())

		from ray.autoscaler.v2.sdk import get_cluster_status

Copy link

Member

kevin85421Dec 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Do we need to import this? It seems to have already been imported at the top level.

Copy link

ContributorAuthor

mimiliaogoDec 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Above suggestions fixed in415dcf8

kevin85421 added the goadd ONLY when ready to merge, run all tests label

Dec 17, 2024

kevin85421 approved these changes

Dec 17, 2024

View reviewed changes

Copy link

Member

kevin85421 commentedDec 17, 2024

CI fails. Can you fix the CI errors?

kevin85421 mentioned this pull request

Dec 18, 2024

[Bug] Autoscaler sideacr crashes, bringing down head pod, if request exceeds max pod replicasray-project/kuberay#2385

Closed

2 tasks

mimiliaogo added5 commits

January 19, 2025 21:09

skip placed bundle when request resource

021b833

Signed-off-by: Mimi Liao <mimiliao2000@gmail.com>

add autoscaler test for placement group reschedule when node dies

3657608

Signed-off-by: Mimi Liao <mimiliao2000@gmail.com>

Improve node_id check to explicitly handle empty byte string

7200225

Signed-off-by: Mimi Liao <mimiliao2000@gmail.com>

fix review comments

5e2dc24

Signed-off-by: Mimi Liao <mimiliao2000@gmail.com>

fix bugs

d9f0cd9

Signed-off-by: Mimi Liao <mimiliao2000@gmail.com>

mimiliaogo force-pushed thepg-overprovision branch from415dcf8 tod9f0cd9Compare

January 20, 2025 03:09

chmeyers reviewed

Feb 7, 2025

View reviewed changes

python/ray/autoscaler/_private/resource_demand_scheduler.py

		@@ -986,7 +986,13 @@ def placement_groups_to_resource_demands(
		resource_demand_vector = []
		unconverted = []
		for placement_group in pending_placement_groups:
		shapes = [dict(bundle.unit_resources) for bundle in placement_group.bundles]
		# Skip placed bundle (which has node id associated with it).

Copy link

chmeyersFeb 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Is this behavior correct withSTRICT_PACK? If already placed bundles are removed, will the new bundles be placed on different nodes?

Copy link

Contributor

rueianMar 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Yes, the behavior is still correct withSTRICT_PACK because we only calculate shapes here, and shapes here do not directly instruct scheduling. The scheduler will not schedule aSTRICT_PACK group across different nodes.

Back to the duty of thisplacement_groups_to_resource_demands function, it is correct to present the remaining demand of aSTRICT_PACK group to the laterget_bin_pack_residual. Say we have 3 bundles in aSTRICT_PACK but 2 of the bundles have node_id on them, we still need to present the remaining 1 bundle to laterget_bin_pack_residual and thenget_bin_pack_residual should consume it.

kevin85421 mentioned this pull request

Mar 6, 2025

[Autoscaler] Autoscaler gets into infinite cycle of removing and adding nodes, never satisfies placement group#50783

Closed

rueian approved these changes

Mar 6, 2025

View reviewed changes

Merge branch 'master' into pg-overprovision

a4d3141

Copy link

Member

kevin85421 commentedMar 31, 2025

cc@rueian would you mind taking this PR for another pass?

Copy link

Contributor

rueian commentedApr 1, 2025•
edited
Loading

cc@rueian would you mind taking this PR for another pass?

The changes still look good to me, but in my personal experience, we will be asked for a unit test intest_resource_demand_scheduler.py. cc@mimiliaogo.

hainesmichaelc added the community-contributionContributed by the community label

Apr 4, 2025

Merge branch 'master' into pg-overprovision

9a16c73

jjyao assignedrueian

May 6, 2025

rueian added2 commits

May 7, 2025 12:05

Merge branch 'master' into pg-overprovision

a023651

add an unit test to test_resource_demand_scheduler.py

42daec2

Signed-off-by: Rueian <rueiancsie@gmail.com>

Copy link

Contributor

rueian commentedMay 8, 2025

A new unit test is added totest_resource_demand_scheduler.py. I think this PR is now ready to merge. cc@kevin85421 for review again.

kevin85421 reviewed

May 13, 2025

View reviewed changes

python/ray/autoscaler/v2/tests/test_e2e.py


		# Only provision nodes for unplaced bundles;
		# avoid rescheduling the whole placement group.
		wait_for_condition(lambda: verify_nodes(3, 1))

Copy link

Member

kevin85421May 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

do we need to verify whether the new node isR1?

Copy link

Contributor

rueianMay 13, 2025•
edited
Loading

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

verify that inthe new commit.

python/ray/tests/test_resource_demand_scheduler.py Outdated

		# fully idle.
		nodes = provider.non_terminated_nodes({})

		resource_demands = [{"GPU": 1}] * 4

Copy link

Member

kevin85421May 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Remove this. It implies that 4 GPUs on the p2.8xlarge are occupied by these resource demands, which isn’t easy to understand at first glance.

In addition, we also need to check whether the 4 GPUs on the p2.8xlarge are actually occupied. If they aren’t, and the bundles require 8 GPUs in total, the test may still pass even though the underlying behavior is incorrect.

Removeresource_demands.
Increase each bundle from 2 GPUs to 4 GPUs.

python/ray/tests/test_resource_demand_scheduler.py OutdatedShow resolvedHide resolved

python/ray/tests/test_resource_demand_scheduler.py

		provider.create_node({}, {TAG_RAY_USER_NODE_TYPE: "p2.8xlarge"}, 1)
		# At this point our cluster has 1 p2.8xlarge instances (8 GPUs) and is
		# fully idle.
		nodes = provider.non_terminated_nodes({})

Copy link

Member

kevin85421May 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Does this simulate the case that would happen at runtime (node_1 doesn't exist innodes)?

Copy link

Contributor

rueianMay 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I replaced the node_1 with the existing node in the new commit.

rueian added2 commits

May 13, 2025 14:19

add an unit test to test_resource_demand_scheduler.py

6448cbd

Signed-off-by: Rueian <rueiancsie@gmail.com>

verify that R1 node is recreated

3c1f422

Signed-off-by: Rueian <rueiancsie@gmail.com>

kevin85421 approved these changes

May 13, 2025

View reviewed changes

Copy link

Member

kevin85421 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

@rueian can you open an issue in KubeRay repo to track the progress of adding KubeRay e2e tests in KubeRay repo for this PR?

python/ray/tests/test_resource_demand_scheduler.py

		PlacementGroupTableData(
		state=PlacementGroupTableData.PENDING,
		strategy=PlacementStrategy.PACK,
		bundles=[

Copy link

Member

kevin85421May 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I am also not sure whether this is possible to happen in runtime.

Copy link

Contributor

rueianMay 14, 2025•
edited
Loading

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Imagine that the placement group was originally spread across 2 nodes (that is possible with the best-effort
PACK strategy) but later the second node disappeared. Now, we have 1 node left alive, and if it has enough resources available this time for the bundle that was originally on the disappeared node, then we should not launch a new node.

Copy link

Member

kevin85421 commentedMay 13, 2025

@rueian please ping me when all CI tests pass. Thanks!

Merge branch 'master' into pg-overprovision

3f9bafa

Copy link

Contributor

rueian commentedMay 14, 2025

Hi@kevin85421, all CI tests are passed.

Copy link

Member

kevin85421 commentedMay 14, 2025

cc@jjyao @edoakes for merge.@rueian would you mind adding this PR to the autoscaler umbrella issue and open an issue to track the progress for e2e tests in KubeRay repo?

Copy link

Contributor

rueian commentedMay 14, 2025

sure

jjyao merged commitab03e3b intoray-project:master

May 14, 2025

5 checks passed

zhaoch23 pushed a commit to Bye-legumes/ray that referenced this pull request

May 14, 2025

[Autoscaler][Placement Group] Skip placed bundle when requesting reso…

461212a

…urce (ray-project#48924)Signed-off-by: Mimi Liao <mimiliao2000@gmail.com>Signed-off-by: zhaoch23 <c233zhao@uwaterloo.ca>

iamjustinhsu pushed a commit to iamjustinhsu/ray that referenced this pull request

May 15, 2025

[Autoscaler][Placement Group] Skip placed bundle when requesting reso…

80cf0dd

…urce (ray-project#48924)Signed-off-by: Mimi Liao <mimiliao2000@gmail.com>Signed-off-by: iamjustinhsu <jhsu@anyscale.com>

rueian mentioned this pull request

May 16, 2025

[Umbrella] Add Autoscaler e2e tests for not over provisioning nodes for placement groups recoveryray-project/kuberay#3620

Open

lk-chen pushed a commit to lk-chen/ray that referenced this pull request

May 17, 2025

[Autoscaler][Placement Group] Skip placed bundle when requesting reso…

3af08b9

…urce (ray-project#48924)Signed-off-by: Mimi Liao <mimiliao2000@gmail.com>

hainesmichaelc added the community-backlog label

May 22, 2025

vickytsang pushed a commit to ROCm/ray that referenced this pull request

Jun 3, 2025

[Autoscaler][Placement Group] Skip placed bundle when requesting reso…

601b991

…urce (ray-project#48924)Signed-off-by: Mimi Liao <mimiliao2000@gmail.com>Signed-off-by: Vicky Tsang <vtsang@amd.com>

rebel-scottlee pushed a commit to rebellions-sw/ray that referenced this pull request

Jun 21, 2025

[Autoscaler][Placement Group] Skip placed bundle when requesting reso…

279da0e

…urce (ray-project#48924)Signed-off-by: Mimi Liao <mimiliao2000@gmail.com>Signed-off-by: Scott Lee <scott.lee@rebellions.ai>

Labels

community-backlog community-contribution

Contributed by the community

core

Issues that should be addressed in Ray Core

add ONLY when ready to merge, run all tests

7 participants

Movatterモバイル変換

[Autoscaler][Placement Group] Skip placed bundle when requesting resource#48924

[Autoscaler][Placement Group] Skip placed bundle when requesting resource#48924

Uh oh!

Conversation

mimiliaogo commentedNov 25, 2024• editedLoading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

Why are these changes needed?

Before: Every bundles get rescheduled

After: Only one node will be scaled up

Related issue number

Checks

Uh oh!

kevin85421 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kevin85421 commentedDec 17, 2024

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kevin85421 commentedMar 31, 2025

Uh oh!

rueian commentedApr 1, 2025• editedLoading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

Uh oh!

rueian commentedMay 8, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rueianMay 13, 2025• editedLoading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kevin85421 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rueianMay 14, 2025• editedLoading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kevin85421 commentedMay 13, 2025

Uh oh!

rueian commentedMay 14, 2025

Uh oh!

kevin85421 commentedMay 14, 2025

mimiliaogo commentedNov 25, 2024•
edited
Loading

rueian commentedApr 1, 2025•
edited
Loading

rueianMay 13, 2025•
edited
Loading

rueianMay 14, 2025•
edited
Loading