- Notifications
You must be signed in to change notification settings - Fork26.3k
[MPSInductor] Fix multistage reduction suffixes#153362
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
Uh oh!
There was an error while loading.Please reload this page.
Conversation
pytorch-botbot commentedMay 12, 2025 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
🔗 Helpful Links🧪 See artifacts and rendered test results athud.pytorch.org/pr/153362
Note: Links to docs will display an error until the docs builds have been completed. ⏳ 66 Pending, 2 Unrelated FailuresAs of commit31f175e with merge basee4f2282 ( FLAKY - The following job failed but was likely due to flakiness present on trunk:
BROKEN TRUNK - The following job failed but was present on the merge base:👉Rebase onto the `viable/strict` branch to avoid these failures
UNSTABLE - The following jobs are marked as unstable, possibly due to flakiness on trunk:This comment was automatically generated by Dr. CI and updates every 15 minutes. |
dcci commentedMay 12, 2025
LGTM |
malfet commentedMay 13, 2025
@pytorchbot merge -f "Lint + MPS are green" |
pytorchmergebot commentedMay 13, 2025
Merge startedYour change will be merged immediately since you used the force (-f) flag,bypassing any CI checks (ETA: 1-5 minutes). Please use Learn more about merging in thewiki. Questions? Feedback? Please reach out to thePyTorch DevX Team |
Uh oh!
There was an error while loading.Please reload this page.
Stack fromghstack (oldest at bottom):
By invalidating all variable created during the loop except for the context of iterator_cache, as storage can be done inside reduction loop and clear
IteratorRangeEntrycodegen cache.Which results in the following kernel for
x / x.sum()if x size is 2048 and max thread group size is 1024Fixes compilation report reported while running
GPUTests.test_pattern_matcher_multi_user_mpsandGPUTests.test_weight_norm_bwd_mpsFixes#152155
Though inductor tests are still failing, need to keep refining the variable invalidation
cc@voznesenskym@penguinwu@EikanWang@jgong5@Guobing-Chen@XiaobingSuper@zhuhaozhe@blzheng@wenzhe-nrv@jiayisunx@ipiszy@chenyang78@kadeng@muchulee8@amjames@chauhang@aakhundov