Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

[Core] Use real CPU count available to a Ray process#46424

Conversation

Superskyyy
Copy link
Contributor

@SuperskyyySuperskyyy commentedJul 3, 2024
edited
Loading

This PR fixes the issue where inaccurate CPU count was detected when a node is shared by multiple users or an external tool has added restriction for processes to access all CPUs on a node.

multiprocessing.cpu_count is kept for fallback and still referenced due to back-wards compatible behavior inside docker.

closes:#34846

Why are these changes needed?

Related issue number

Checks

  • I've signed off every commit(by using the -s flag, i.e.,git commit -s) in this PR.
  • I've runscripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed forhttps://docs.ray.io/en/master/.
    • I've added any new APIs to the API Reference. For example, if I added a
      method in Tune, I've added it indoc/source/tune/api/ under the
      corresponding.rst file.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures athttps://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

Signed-off-by: Superskyyy <yihaochen@apache.org>
@SuperskyyySuperskyyyforce-pushed thefix/use-real-cpu-count branch from9fedc13 tob47089eCompareJuly 3, 2024 21:00
Signed-off-by: Superskyyy <yihaochen@apache.org>
@SuperskyyySuperskyyyforce-pushed thefix/use-real-cpu-count branch from8c971b7 to2f6e3f6CompareJuly 4, 2024 16:08
@anyscalesamanyscalesam added triageNeeds triage (eg: priority, bug/not-bug, and owning component) coreIssues that should be addressed in Ray Core labelsAug 8, 2024
@DmitriGekhtman
Copy link
Contributor

DmitriGekhtman commentedAug 11, 2024
edited
Loading

I'm sure you already know this, but while we wait on this to get reviewed, merged and released, you can work around the issue by specifying the--num-cpus parameter toray start explicitly.
In fact, you may find that option to be more robust and reliable -- speaking for myself, I prefer "stated and explicit" over "inferred and implicit".

return multiprocessing.cpu_count()

if hasattr(os, "sched_getaffinity"):
# Reflects the real CPU count available to the calling thread of the process.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

This could use a bit more explanation -- why is CPU count available to the calling thread of this process important?
What's important is the CPUs available to the Ray worker processes of the Ray node that this process is starting.
Are the CPUs available to the Ray worker processes on this Ray node necessarily the same as the CPUs available to theray start process?

Also, could you explain more, in documentation/comments, about the type of use-case this logic is aimed at?

Copy link
Contributor

@rkooo567rkooo567 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

can you tell me exactly why your environment was seeingos.sched_getaffinity(0)? What kind of tools are you using when starting ray?

DmitriGekhtman reacted with thumbs up emoji
@jjyaojjyao added @external-author-action-requiredAlternate tag for PRs where the author doesn't have labeling permission. P1Issue that should be fixed within a few weeks and removed triageNeeds triage (eg: priority, bug/not-bug, and owning component) labelsSep 16, 2024
@hainesmichaelchainesmichaelc added the community-contributionContributed by the community labelApr 4, 2025
Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment
Reviewers

@rkooo567rkooo567rkooo567 left review comments

@DmitriGekhtmanDmitriGekhtmanDmitriGekhtman left review comments

@jjyaojjyaoAwaiting requested review from jjyao

Labels
community-contributionContributed by the communitycoreIssues that should be addressed in Ray Core@external-author-action-requiredAlternate tag for PRs where the author doesn't have labeling permission.P1Issue that should be fixed within a few weeks
Projects
None yet
Milestone
No milestone
Development

Successfully merging this pull request may close these issues.

[Core] Incorrect detection of cpus
6 participants
@Superskyyy@DmitriGekhtman@rkooo567@jjyao@hainesmichaelc@anyscalesam

[8]ページ先頭

©2009-2025 Movatter.jp