- Notifications
You must be signed in to change notification settings - Fork6.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
[core] split scheduler into smaller targets to improve build performance#51641
Conversation
Hey,@dentiny! Would you mind to take a look? |
Thanks for the benchmark! Curious what target are you building? |
BUILD.bazel Outdated
ray_cc_library( | ||
name = "scheduling_options", | ||
hdrs = ["src/ray/raylet/scheduling/policy/scheduling_options.h"], | ||
deps = [":ray_common"], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
"ray_common" is a giant dependency, please use necessary sub-dependencies
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
If you think it too hard, I'm ok with leaving a TODO and make a new PR
BUILD.bazel Outdated
":local_resource_manager", | ||
":ray_common", | ||
"@com_google_absl//absl/container:flat_hash_map", | ||
"@com_google_googletest//:gtest", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
depending ongtest
for production code is not correct, please use"@com_google_googletest//:gtest_prod"
instead
":cluster_resource_scheduler", | ||
":cluster_task_manager_interface", | ||
":local_task_manager_interface", | ||
":scheduler_resource_reporter", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Miss a few common libraries and abseil libraries
ray_cc_library( | ||
name = "cluster_task_manager_interface", | ||
hdrs = ["src/ray/raylet/scheduling/cluster_task_manager_interface.h"], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
please add dependencies for all headers
ray/src/ray/raylet/scheduling/cluster_task_manager_interface.h
Lines 20 to 21 in1103bd1
#include"ray/rpc/server_call.h" | |
#include"src/ray/protobuf/node_manager.pb.h" |
BUILD.bazel Outdated
"@com_google_absl//absl/memory", | ||
"@com_google_absl//absl/random", | ||
"@com_google_absl//absl/random:bit_gen_ref", | ||
"@com_google_absl//absl/strings", | ||
"@com_google_googletest//:gtest", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
same here, should usegtest_prod
BUILD.bazel Outdated
name = "affinity_with_bundle_scheduling_policy", | ||
srcs = ["src/ray/raylet/scheduling/policy/affinity_with_bundle_scheduling_policy.cc"], | ||
hdrs = ["src/ray/raylet/scheduling/policy/affinity_with_bundle_scheduling_policy.h"], | ||
deps = [":scheduling_policy"], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
could you please also add
#include"ray/common/bundle_location_index.h" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Thanks for the effort! Overall it LGTM, my comments are:
- Don't use
gtest
as dependency for production targets - Add all dependent targets to avoid implicit dependencies
- Don't use
ray_common
, which is a giant target only made for backward compatibility
If you think it's hard / overwhelming to make all these changes on one iteration, I'm ok with you leaving TODO items and make new PRs.
Let me know if you have any problems / questions, happy to help :) |
The target //:scheduler only. I have updated some reproduce steps. |
BUILD.bazel Outdated
":node_label_scheduling_policy", | ||
":random_scheduling_policy", | ||
":spread_scheduling_policy", | ||
"//src/ray/util", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Sorry for being picky, but could you please remove the header inclusion and dependency ofray/util
?
#include"ray/util/util.h" |
I don't think it's used anywhere
BUILD.bazel Outdated
hdrs = ["src/ray/raylet/scheduling/policy/hybrid_scheduling_policy.h"], | ||
deps = [ | ||
":scheduling_policy", | ||
"//src/ray/util", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Same here, could you please remove the header and deps forray/util
?
#include"ray/util/util.h" |
BUILD.bazel Outdated
hdrs = ["src/ray/raylet/scheduling/policy/random_scheduling_policy.h"], | ||
deps = [ | ||
":scheduling_policy", | ||
"//src/ray/util", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
same here:
#include"ray/util/util.h" |
BUILD.bazel Outdated
deps = [ | ||
":hybrid_scheduling_policy", | ||
":scheduling_policy", | ||
"//src/ray/util", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Same:
#include"ray/util/util.h" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
container util seems useless as well:
#include"ray/util/container_util.h" |
BUILD.bazel Outdated
deps = [ | ||
":scheduling_policy", | ||
"//src/ray/util", | ||
"//src/ray/util:container_util", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
container_util
seems unused
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
const auto &node = map_find_or_die(nodes_, node_id);
still need this header.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
You're right, thank you for checking!
BUILD.bazel Outdated
":random_scheduling_policy", | ||
":spread_scheduling_policy", | ||
"//src/ray/util", | ||
"//src/ray/util:container_util", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Seem unused, could you please try remove?
hdrs = ["src/ray/raylet/scheduling/local_resource_manager.h"], | ||
deps = [ | ||
":gcs_client_lib", | ||
":ray_common", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Hi could you please paste the TODO (and also the github issue) everywhere, so we don't miss?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Thank you so much for the change! Overall LGTM, just a few last comments:
- Let's leave github issue links everywhere so we don't forget;
ray/util
andray/container_util
seems largely unused, let's cleanup the header inclusion and dependencies
The microcheck failure doesn't seem to be related to your change, so let's ignore it for now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
LGTM! Thank you for your contribution!
I just enabled CI for you, feel free to ping me when it passes.
Signed-off-by: Ziy1-Tan <ajb459684460@gmail.com>
Signed-off-by: Ziy1-Tan <ajb459684460@gmail.com>
Signed-off-by: Ziy1-Tan <ajb459684460@gmail.com>
Thank you for your reviews! I think it's time to merge. |
…nce (ray-project#51641)Closesray-project#51634Before:```INFO: Elapsed time: 600.027s, Critical Path: 94.84sINFO: 3687 processes: 460 disk cache hit, 1400 internal, 1827 linux-sandbox.INFO: Build completed successfully, 3687 total actions```After:```INFO: Elapsed time: 440.950s, Critical Path: 89.94sINFO: 3932 processes: 1647 internal, 2285 linux-sandbox.INFO: Build completed successfully, 3932 total actions```Reproduce steps:```bazel cleansudo sync; sudo sh -c "echo 3 > /proc/sys/vm/drop_caches"bazel build //:scheduler```About **26.5%** compile time reduction.---------Signed-off-by: Ziy1-Tan <ajb459684460@gmail.com>Signed-off-by: Justin Miller <justinrmiller@gmail.com>
…nce (ray-project#51641)Closesray-project#51634 Before:```INFO: Elapsed time: 600.027s, Critical Path: 94.84sINFO: 3687 processes: 460 disk cache hit, 1400 internal, 1827 linux-sandbox.INFO: Build completed successfully, 3687 total actions```After:```INFO: Elapsed time: 440.950s, Critical Path: 89.94sINFO: 3932 processes: 1647 internal, 2285 linux-sandbox.INFO: Build completed successfully, 3932 total actions```Reproduce steps:```bazel cleansudo sync; sudo sh -c "echo 3 > /proc/sys/vm/drop_caches"bazel build //:scheduler```About **26.5%** compile time reduction.---------Signed-off-by: Ziy1-Tan <ajb459684460@gmail.com>
Why are these changes needed?
Related issue number
Closes#51634
Before:
After:
Reproduce steps:
About26.5% compile time reduction.
Checks
git commit -s
) in this PR.scripts/format.sh
to lint the changes in this PR.method in Tune, I've added it in
doc/source/tune/api/
under thecorresponding
.rst
file.