- Notifications
You must be signed in to change notification settings - Fork26.3k
assert on all_reduce_event only if it's not CPU device.#150316
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
Uh oh!
There was an error while loading.Please reload this page.
Conversation
This appears to be a diff that was exported from phabricator, but the PR author does not have sufficient permissions to run CI.@Ritesh1905, please do step 2 ofinternal wiki to get write access so you do not need to get CI approvals in the future. If you think this is a mistake, please contact the Pytorch Dev Infra team. |
linux-foundation-easyclabot commentedMar 31, 2025 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
|
pytorch-botbot commentedMar 31, 2025 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
🔗 Helpful Links🧪 See artifacts and rendered test results athud.pytorch.org/pr/150316
Note: Links to docs will display an error until the docs builds have been completed. ❗ 1 Active SEVsThere are 1 currently active SEVs. If your PR is affected, please view them below: ✅ You can merge normally! (3 Unrelated Failures)As of commit0ad8a80 with merge basef94ac26 ( BROKEN TRUNK - The following job failed but were present on the merge base:👉Rebase onto the `viable/strict` branch to avoid these failures UNSTABLE - The following jobs are marked as unstable, possibly due to flakiness on trunk:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
facebook-github-bot commentedMar 31, 2025
This pull request wasexported from Phabricator. Differential Revision:D72176406 |
kwen2501 commentedMar 31, 2025
@albanD Is it true that for CPU, |
kwen2501 left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
The change looks okay to me. Asking@albanD for guidance here.
facebook-github-bot commentedApr 1, 2025
This pull request wasexported from Phabricator. Differential Revision:D72176406 |
Summary:Pull Requestresolved:pytorch#150316For CPU based runs, `all_reduce_event` would be None since this is the result of the `all_reduce_stream.record_event()`, which does not do much other than returning None when device type is CPU.Test Plan: CIReviewed By: kwen2501Differential Revision: D72176406
facebook-github-bot commentedApr 1, 2025
This pull request wasexported from Phabricator. Differential Revision:D72176406 |
mori360 left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
The pending check is unstable,#150420
facebook-github-bot commentedApr 2, 2025
@pytorchbot merge (Initiating merge automatically since Phabricator Diff has merged) |
pytorchmergebot commentedApr 2, 2025
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in thewiki. Questions? Feedback? Please reach out to thePyTorch DevX Team |
| if ( | ||
| self.comm_ctx.reduce_scatter_stateisnotNone | ||
| andself.comm_ctx.reduce_scatter_state.eventisnotNone | ||
| ): | ||
| self.device_handle.current_stream().wait_event( | ||
| self.comm_ctx.reduce_scatter_state.event | ||
| ) | ||
| self.comm_ctx.reduce_scatter_state=None | ||
| self.comm_ctx.reduce_scatter_state=None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
nit: maybe it's better to keep the original style and do a double if statement like:
if self.comm_ctx.reduce_scatter_state is not None: if self.comm_ctx.reduce_scatter_state.event is not None: self.device_handle.current_stream().wait_event( self.comm_ctx.reduce_scatter_state.event ) self.comm_ctx.reduce_scatter_state = Nonethere is semantic meaning toif self.comm_ctx.reduce_scatter_state is not None: -- namely that there was a preceding reduce-scatter that we have not waited on
if we ever want to refactor the stashing and waiting logic to be handled in a single API, it would be clearer if we only cleared when we need to
Summary: For CPU based runs, `all_reduce_event` would be None since this is the result of the `all_reduce_stream.record_event()`, which does not do much other than returning None when device type is CPU.Test Plan: CIDifferential Revision: D72176406Pull Requestresolved:pytorch#150316Approved by:https://github.com/kwen2501,https://github.com/weifengpy,https://github.com/mori360
Uh oh!
There was an error while loading.Please reload this page.
Summary: For CPU based runs,
all_reduce_eventwould be None since this is the result of theall_reduce_stream.record_event(), which does not do much other than returning None when device type is CPU.Test Plan: CI
Differential Revision: D72176406
cc@H-Huang@awgu@kwen2501@wanchaol@fegin@fduwjj@wz337@wconstab@d4l3k@c-p-i-o