- Notifications
You must be signed in to change notification settings - Fork6.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
[Autoscaler][Placement Group] Skip placed bundle when requesting resource#48924
base:master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
add a test
shapes = [dict(bundle.unit_resources) for bundle in placement_group.bundles] | ||
# Skip **placed** bundle (which has node id associated with it). | ||
for bundle in placement_group.bundles: | ||
if bundle.node_id: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Is it an empty string orNone
? If it isNone
, useis
instead.
ifbundle.node_id: | |
ifbundle.node_idisnotNone: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
fix in7a44207, it should be an empty byte string.
5818b44
to6353847
Comparebreak | ||
# TODO(mimi): kill_raylet won't trigger reschedule in autoscaler v1 | ||
# kill_raylet(node["NodeManagerAddress"], node["NodeManagerPort"]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
I found that when usingkill_raylet
, reschedule won't be triggered in autoscaler v1, even when the cluster status shows the node is killed. In this case, v1 will fail and v2 pass.
Both v1 and v2 pass when usingkill_node
.
from ray.autoscaler.v2.sdk import get_cluster_status | ||
def verify_nodes(active=3, idle=1): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
defverify_nodes(active=3,idle=1): | |
defverify_nodes(active,idle): |
def kill_node(node_id): | ||
# kill -9 | ||
import subprocess |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Moveimport
to top-level. Typically, Ray uses deferred import only to avoid circular dependencies.
wait_for_condition(lambda: verify_nodes(3, 1)) | ||
# Kill a node | ||
def kill_raylet(ip, port, graceful=True): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Remove this function because it is not used for now.
# Wait for the node to be removed | ||
wait_for_condition(lambda: verify_nodes(2, 1), 20) | ||
# Check that the placement group is rescheduled |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
where is the logic to check the placement group is rescheduled?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
wait_for_condition(lambda: verify_nodes(3, 1))
is to check the autoscaler rescheduling. However, this comment is redundant, I've already removed it.
ray.get(pg.ready()) | ||
from ray.autoscaler.v2.sdk import get_cluster_status |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Do we need to import this? It seems to have already been imported at the top level.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Above suggestions fixed in415dcf8
CI fails. Can you fix the CI errors? |
Signed-off-by: Mimi Liao <mimiliao2000@gmail.com>
Signed-off-by: Mimi Liao <mimiliao2000@gmail.com>
Signed-off-by: Mimi Liao <mimiliao2000@gmail.com>
Signed-off-by: Mimi Liao <mimiliao2000@gmail.com>
415dcf8
tod9f0cd9
Compare@@ -986,7 +986,13 @@ def placement_groups_to_resource_demands( | |||
resource_demand_vector = [] | |||
unconverted = [] | |||
for placement_group in pending_placement_groups: | |||
shapes = [dict(bundle.unit_resources) for bundle in placement_group.bundles] | |||
# Skip **placed** bundle (which has node id associated with it). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Is this behavior correct withSTRICT_PACK
? If already placed bundles are removed, will the new bundles be placed on different nodes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Yes, the behavior is still correct withSTRICT_PACK
because we only calculate shapes here, and shapes here do not directly instruct scheduling. The scheduler will not schedule aSTRICT_PACK
group across different nodes.
Back to the duty of thisplacement_groups_to_resource_demands
function, it is correct to present the remaining demand of aSTRICT_PACK
group to the laterget_bin_pack_residual
. Say we have 3 bundles in aSTRICT_PACK
but 2 of the bundles have node_id on them, we still need to present the remaining 1 bundle to laterget_bin_pack_residual
and thenget_bin_pack_residual
should consume it.
cc@rueian would you mind taking this PR for another pass? |
The changes still look good to me, but in my personal experience, we will be asked for a unit test in |
Why are these changes needed?
Before the PR, when a node in a placement group (PG) goes down, the autoscaler attempts to reschedule the entire PG (all bundles). However, this will lead to overprovisioning. Details:#40212
This PR solved this by skipping already placed bundles (i.e., bundles with an associated node_id) when demanding resources in autoscaler.
Before: Every bundles get rescheduled
After: Only one node will be scaled up
Related issue number
Closes#40212
Checks
git commit -s
) in this PR.scripts/format.sh
to lint the changes in this PR.method in Tune, I've added it in
doc/source/tune/api/
under thecorresponding
.rst
file.