Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

c10d/Store: add nonblocking mode to queue_pop#151485

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Closed
d4l3k wants to merge1 commit intomainfromd4l3k/queue_block

Conversation

@d4l3k
Copy link
Member

@d4l3kd4l3k commentedApr 16, 2025
edited by pytorch-botbot
Loading

This adds a non-blocking mode to queue_pop. This allows for workers to poll if work is ready without blocking the main loop. This is useful for the case where you want to have a GPU have maximum utilization when something only periodically is sent on the queue.

We also expose atorch.distributed.QueueEmptyError so users can catch the error and handle it accordingly.

Test plan:

pytest test/distributed/test_store.py -k queue -v -s -x

cc@H-Huang@awgu@wanchaol@fegin@fduwjj@wz337@wconstab

@pytorch-bot
Copy link

pytorch-botbot commentedApr 16, 2025
edited
Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results athud.pytorch.org/pr/151485

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (1 Unrelated Failure)

As of commit31825f4 with merge base7f52875 (image):

UNSTABLE - The following job is marked as unstable, possibly due to flakiness on trunk:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@pytorch-botpytorch-botbot added oncall: distributedAdd this issue/PR to distributed oncall triage queue release notes: distributed (c10d)release notes category labelsApr 16, 2025
Copy link
Contributor

@fduwjjfduwjj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

LGTM

Comment on lines +124 to +126
PyModule_AddObject(
module,"_DistQueueEmptyError", THPException_DistQueueEmptyError) ==
0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Is the purpose of doing this to add the exception into PyTorch python object exception?

Copy link
MemberAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Yes -- we need to do this to surface it to Python. Though, honestly maybe we should move all of the PTD errors to use pybind instead of THP for exception translation. THP is just so painful to work with

@d4l3k
Copy link
MemberAuthor

@pytorchbot merge

pytorch-bot[bot] reacted with thumbs up emoji

@pytorch-botpytorch-botbot added the ciflow/trunkTrigger trunk jobs on your pull request labelApr 17, 2025
@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in thewiki.

Questions? Feedback? Please reach out to thePyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@pytorchmergebot
Copy link
Collaborator

Merge failed

Reason: 1 mandatory check(s) failed. The first few are:

Dig deeper byviewing the failures on hud

Details for Dev Infra teamRaised byworkflow job

Failing merge rule: Core Maintainers

@d4l3k
Copy link
MemberAuthor

@pytorchbot merge -i

pytorch-bot[bot] reacted with thumbs up emoji

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged while ignoring the following 7 checks:pull / unstable-linux-focal-cuda12.6-py3.10-gcc11-sm89-xfail / build,pull / linux-focal-py3.9-clang10 / test (default, 1, 5, lf.ephemeral.linux.4xlarge),pull / linux-focal-py3.9-clang10 / test (crossref, 2, 2, lf.ephemeral.linux.2xlarge),pull / linux-focal-py3.13-clang10 / test (default, 3, 5, lf.ephemeral.linux.4xlarge),pull / linux-focal-py3.13-clang10 / test (crossref, 2, 2, lf.ephemeral.linux.2xlarge),pull / linux-jammy-py3.9-gcc11 / test (default, 3, 5, lf.ephemeral.linux.2xlarge),pull / linux-jammy-py3.10-clang15-asan / test (default, 5, 6, lf.ephemeral.linux.4xlarge)

Learn more about merging in thewiki.

Questions? Feedback? Please reach out to thePyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@pytorchmergebot
Copy link
Collaborator

Merge failed

Reason: 1 jobs have failed, first few of them are:trunk / macos-py3-arm64 / test (default, 2, 3, macos-m1-stable)

Details for Dev Infra teamRaised byworkflow job

@d4l3k
Copy link
MemberAuthor

@pytorchbot merge -i

pytorch-bot[bot] reacted with thumbs up emoji

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged while ignoring the following 14 checks:pull / unstable-linux-focal-cuda12.6-py3.10-gcc11-sm89-xfail / build,pull / linux-focal-py3.9-clang10 / test (default, 1, 5, lf.ephemeral.linux.4xlarge),pull / linux-focal-py3.9-clang10 / test (crossref, 2, 2, lf.ephemeral.linux.2xlarge),pull / linux-focal-py3.9-clang10 / test (dynamo_wrapped, 1, 3, lf.ephemeral.linux.2xlarge),pull / linux-focal-py3.13-clang10 / test (default, 3, 5, lf.ephemeral.linux.4xlarge),pull / linux-focal-py3.13-clang10 / test (crossref, 2, 2, lf.ephemeral.linux.2xlarge),pull / linux-focal-py3.13-clang10 / test (dynamo_wrapped, 1, 3, lf.ephemeral.linux.2xlarge),pull / linux-jammy-py3.9-gcc11 / test (default, 3, 5, lf.ephemeral.linux.2xlarge),pull / linux-jammy-py3.10-clang15-asan / test (default, 5, 6, lf.ephemeral.linux.4xlarge),pull / linux-focal-cuda12.6-py3.10-gcc11 / test (default, 3, 5, lf.ephemeral.linux.4xlarge.nvidia.gpu),pull / linux-focal-cuda12.6-py3.10-gcc11-sm89 / test (default, 2, 5, lf.ephemeral.linux.g6.4xlarge.experimental.nvidia.gpu),trunk / macos-py3-arm64 / test (default, 2, 3, macos-m1-stable),trunk / macos-py3-arm64 / test (default, 1, 3, macos-m1-stable),trunk / macos-py3-arm64 / test (default, 3, 3, macos-m1-stable)

Learn more about merging in thewiki.

Questions? Feedback? Please reach out to thePyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@d4l3k
Copy link
MemberAuthor

@pytorchbot merge -i

pytorch-bot[bot] reacted with thumbs up emoji

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged while ignoring the following 14 checks:pull / unstable-linux-focal-cuda12.6-py3.10-gcc11-sm89-xfail / build,pull / linux-focal-py3.9-clang10 / test (default, 1, 5, lf.ephemeral.linux.4xlarge),pull / linux-focal-py3.9-clang10 / test (crossref, 2, 2, lf.ephemeral.linux.2xlarge),pull / linux-focal-py3.9-clang10 / test (dynamo_wrapped, 1, 3, lf.ephemeral.linux.2xlarge),pull / linux-focal-py3.13-clang10 / test (default, 3, 5, lf.ephemeral.linux.4xlarge),pull / linux-focal-py3.13-clang10 / test (crossref, 2, 2, lf.ephemeral.linux.2xlarge),pull / linux-focal-py3.13-clang10 / test (dynamo_wrapped, 1, 3, lf.ephemeral.linux.2xlarge),pull / linux-jammy-py3.9-gcc11 / test (default, 3, 5, lf.ephemeral.linux.2xlarge),pull / linux-jammy-py3.10-clang15-asan / test (default, 5, 6, lf.ephemeral.linux.4xlarge),pull / linux-focal-cuda12.6-py3.10-gcc11 / test (default, 3, 5, lf.ephemeral.linux.4xlarge.nvidia.gpu),pull / linux-focal-cuda12.6-py3.10-gcc11-sm89 / test (default, 2, 5, lf.ephemeral.linux.g6.4xlarge.experimental.nvidia.gpu),trunk / macos-py3-arm64 / test (default, 2, 3, macos-m1-stable),trunk / macos-py3-arm64 / test (default, 1, 3, macos-m1-stable),trunk / macos-py3-arm64 / test (default, 3, 3, macos-m1-stable)

Learn more about merging in thewiki.

Questions? Feedback? Please reach out to thePyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@d4l3k
Copy link
MemberAuthor

@pytorchbot rebase

pytorch-bot[bot] reacted with thumbs up emoji

@pytorchmergebot
Copy link
Collaborator

@pytorchbot started a rebase job ontorefs/remotes/origin/viable/strict. Check the current statushere

@pytorchmergebot
Copy link
Collaborator

Successfully rebasedd4l3k/queue_block ontorefs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, viagit checkout d4l3k/queue_block && git pull --rebase)

@d4l3k
Copy link
MemberAuthor

@pytorchmergebot merge

pytorch-bot[bot] reacted with thumbs up emoji

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in thewiki.

Questions? Feedback? Please reach out to thePyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@github-actionsgithub-actionsbot deleted the d4l3k/queue_block branchMay 28, 2025 02:20
Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment

Reviewers

@fduwjjfduwjjfduwjj approved these changes

+1 more reviewer

@tianfengfranktianfengfranktianfengfrank approved these changes

Reviewers whose approvals may not affect merge requirements

Assignees

No one assigned

Labels

ciflow/trunkTrigger trunk jobs on your pull requestMergedoncall: distributedAdd this issue/PR to distributed oncall triage queuerelease notes: distributed (c10d)release notes category

Projects

None yet

Milestone

No milestone

Development

Successfully merging this pull request may close these issues.

5 participants

@d4l3k@pytorchmergebot@fduwjj@tianfengfrank

[8]ページ先頭

©2009-2025 Movatter.jp