- Notifications
You must be signed in to change notification settings - Fork6.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
[Core] Use real CPU count available to a Ray process#46424
Conversation
Signed-off-by: Superskyyy <yihaochen@apache.org>
9fedc13
tob47089e
Compare8c971b7
to2f6e3f6
CompareI'm sure you already know this, but while we wait on this to get reviewed, merged and released, you can work around the issue by specifying the |
return multiprocessing.cpu_count() | ||
if hasattr(os, "sched_getaffinity"): | ||
# Reflects the real CPU count available to the calling thread of the process. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
This could use a bit more explanation -- why is CPU count available to the calling thread of this process important?
What's important is the CPUs available to the Ray worker processes of the Ray node that this process is starting.
Are the CPUs available to the Ray worker processes on this Ray node necessarily the same as the CPUs available to theray start
process?
Also, could you explain more, in documentation/comments, about the type of use-case this logic is aimed at?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
can you tell me exactly why your environment was seeingos.sched_getaffinity(0)
? What kind of tools are you using when starting ray?
This PR fixes the issue where inaccurate CPU count was detected when a node is shared by multiple users or an external tool has added restriction for processes to access all CPUs on a node.
multiprocessing.cpu_count
is kept for fallback and still referenced due to back-wards compatible behavior inside docker.closes:#34846
Why are these changes needed?
Related issue number
Checks
git commit -s
) in this PR.scripts/format.sh
to lint the changes in this PR.method in Tune, I've added it in
doc/source/tune/api/
under thecorresponding
.rst
file.