Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

assert on all_reduce_event only if it's not CPU device.#150316

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Closed

Conversation

@Ritesh1905
Copy link
Contributor

@Ritesh1905Ritesh1905 commentedMar 31, 2025
edited by pytorch-botbot
Loading

Summary: For CPU based runs,all_reduce_event would be None since this is the result of theall_reduce_stream.record_event(), which does not do much other than returning None when device type is CPU.

Test Plan: CI

Differential Revision: D72176406

cc@H-Huang@awgu@kwen2501@wanchaol@fegin@fduwjj@wz337@wconstab@d4l3k@c-p-i-o

@pytorch-bot
Copy link

This appears to be a diff that was exported from phabricator, but the PR author does not have sufficient permissions to run CI.@Ritesh1905, please do step 2 ofinternal wiki to get write access so you do not need to get CI approvals in the future. If you think this is a mistake, please contact the Pytorch Dev Infra team.

@linux-foundation-easycla
Copy link

linux-foundation-easyclabot commentedMar 31, 2025
edited
Loading

CLA Signed

The committers listed above are authorized under a signed CLA.

  • ✅ login: Ritesh1905 / name: Rithesh Baradi (0ad8a80)

@pytorch-botpytorch-botbot added oncall: distributedAdd this issue/PR to distributed oncall triage queue release notes: distributed (fsdp)release notes category labelsMar 31, 2025
@pytorch-bot
Copy link

pytorch-botbot commentedMar 31, 2025
edited
Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results athud.pytorch.org/pr/150316

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

✅ You can merge normally! (3 Unrelated Failures)

As of commit0ad8a80 with merge basef94ac26 (image):

BROKEN TRUNK - The following job failed but were present on the merge base:

👉Rebase onto the `viable/strict` branch to avoid these failures

UNSTABLE - The following jobs are marked as unstable, possibly due to flakiness on trunk:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot
Copy link
Contributor

This pull request wasexported from Phabricator. Differential Revision:D72176406

@kwen2501
Copy link
Collaborator

@albanD Is it true that for CPU,stream.record_event() would return None? Just for our education.

Copy link
Collaborator

@kwen2501kwen2501 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

The change looks okay to me. Asking@albanD for guidance here.

@pytorch-botpytorch-botbot added the ciflow/trunkTrigger trunk jobs on your pull request labelMar 31, 2025
@facebook-github-bot
Copy link
Contributor

This pull request wasexported from Phabricator. Differential Revision:D72176406

Summary:Pull Requestresolved:pytorch#150316For CPU based runs, `all_reduce_event` would be None since this is the result of the `all_reduce_stream.record_event()`, which does not do much other than returning None when device type is CPU.Test Plan: CIReviewed By: kwen2501Differential Revision: D72176406
@facebook-github-bot
Copy link
Contributor

This pull request wasexported from Phabricator. Differential Revision:D72176406

Copy link
Contributor

@mori360mori360 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

The pending check is unstable,#150420

@facebook-github-bot
Copy link
Contributor

@pytorchbot merge

(Initiating merge automatically since Phabricator Diff has merged)

pytorch-bot[bot] reacted with thumbs up emoji

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in thewiki.

Questions? Feedback? Please reach out to thePyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

Comment on lines +417 to +424
if (
self.comm_ctx.reduce_scatter_stateisnotNone
andself.comm_ctx.reduce_scatter_state.eventisnotNone
):
self.device_handle.current_stream().wait_event(
self.comm_ctx.reduce_scatter_state.event
)
self.comm_ctx.reduce_scatter_state=None
self.comm_ctx.reduce_scatter_state=None
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

nit: maybe it's better to keep the original style and do a double if statement like:

if self.comm_ctx.reduce_scatter_state is not None:    if self.comm_ctx.reduce_scatter_state.event is not None:        self.device_handle.current_stream().wait_event(            self.comm_ctx.reduce_scatter_state.event        )    self.comm_ctx.reduce_scatter_state = None

there is semantic meaning toif self.comm_ctx.reduce_scatter_state is not None: -- namely that there was a preceding reduce-scatter that we have not waited on
if we ever want to refactor the stashing and waiting logic to be handled in a single API, it would be clearer if we only cleared when we need to

amathewc pushed a commit to amathewc/pytorch that referenced this pull requestApr 17, 2025
Summary: For CPU based runs, `all_reduce_event` would be None since this is the result of the `all_reduce_stream.record_event()`, which does not do much other than returning None when device type is CPU.Test Plan: CIDifferential Revision: D72176406Pull Requestresolved:pytorch#150316Approved by:https://github.com/kwen2501,https://github.com/weifengpy,https://github.com/mori360
Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment

Reviewers

@awguawguawgu left review comments

@kwen2501kwen2501kwen2501 approved these changes

@weifengpyweifengpyweifengpy approved these changes

@mori360mori360mori360 approved these changes

@albanDalbanDAwaiting requested review from albanD

Assignees

No one assigned

Labels

ciflow/inductorciflow/trunkTrigger trunk jobs on your pull requestfb-exportedMergedoncall: distributedAdd this issue/PR to distributed oncall triage queuerelease notes: distributed (fsdp)release notes category

Projects

None yet

Milestone

No milestone

Development

Successfully merging this pull request may close these issues.

7 participants

@Ritesh1905@facebook-github-bot@kwen2501@pytorchmergebot@awgu@weifengpy@mori360

[8]ページ先頭

©2009-2025 Movatter.jp