Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

[MPSInductor] Fix multistage reduction suffixes#153362

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Closed
malfet wants to merge6 commits intogh/malfet/329/basefromgh/malfet/329/head

Conversation

@malfet
Copy link
Contributor

@malfetmalfet commentedMay 12, 2025
edited
Loading

Stack fromghstack (oldest at bottom):

By invalidating all variable created during the loop except for the context of iterator_cache, as storage can be done inside reduction loop and clearIteratorRangeEntry codegen cache.

Which results in the following kernel forx / x.sum() if x size is 2048 and max thread group size is 1024

[[max_total_threads_per_threadgroup(1024)]]kernelvoidgenerated_kernel(    device half* out_ptr1,    constant half* in_ptr0,     uint2 thread_pos [[thread_position_in_grid]],    uint2 group_pos [[thread_position_in_threadgroup]]) {auto xindex = thread_pos.x;auto r0_index = thread_pos.y;    threadgroupfloat tmp_acc_0[32];float tmp_acc_1 =0;for(auto r0_0_cnt =0; r0_0_cnt <2; ++r0_0_cnt) {int r0_0 =2 * r0_index + r0_0_cnt;auto tmp0 =static_cast<float>(in_ptr0[r0_0]);        tmp_acc_1 += tmp0;    }auto tmp1 =c10::metal::threadgroup_sum(tmp_acc_0, tmp_acc_1, r0_index *1,1024);for(auto r0_0_cnt =0; r0_0_cnt <2; ++r0_0_cnt) {int r0_0 =2 * r0_index + r0_0_cnt;auto tmp2 =static_cast<float>(in_ptr0[r0_0]);auto tmp3 = tmp2 / tmp1;        out_ptr1[r0_0] =static_cast<half>(tmp3);    } }

Fixes compilation report reported while runningGPUTests.test_pattern_matcher_multi_user_mps andGPUTests.test_weight_norm_bwd_mps

Fixes#152155

Though inductor tests are still failing, need to keep refining the variable invalidation

cc@voznesenskym@penguinwu@EikanWang@jgong5@Guobing-Chen@XiaobingSuper@zhuhaozhe@blzheng@wenzhe-nrv@jiayisunx@ipiszy@chenyang78@kadeng@muchulee8@amjames@chauhang@aakhundov

[ghstack-poisoned]
@pytorch-bot
Copy link

pytorch-botbot commentedMay 12, 2025
edited
Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results athud.pytorch.org/pr/153362

Note: Links to docs will display an error until the docs builds have been completed.

⏳ 66 Pending, 2 Unrelated Failures

As of commit31f175e with merge basee4f2282 (image):

FLAKY - The following job failed but was likely due to flakiness present on trunk:

BROKEN TRUNK - The following job failed but was present on the merge base:

👉Rebase onto the `viable/strict` branch to avoid these failures

UNSTABLE - The following jobs are marked as unstable, possibly due to flakiness on trunk:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@pytorch-botpytorch-botbot added ciflow/inductor ciflow/mpsRun MPS tests (subset of trunk) module: inductor labelsMay 12, 2025
@malfetmalfet requested review fromdcci andjanselMay 12, 2025 03:45
@malfetmalfet added topic: bug fixestopic category release notes: mpsRelease notes category labelsMay 12, 2025
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
@dcci
Copy link
Member

LGTM

[ghstack-poisoned]
@malfet
Copy link
ContributorAuthor

@pytorchbot merge -f "Lint + MPS are green"

pytorch-bot[bot] reacted with thumbs up emoji

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged immediately since you used the force (-f) flag,bypassing any CI checks (ETA: 1-5 minutes). Please use-f as last resort and instead consider-i/--ignore-current to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.

Learn more about merging in thewiki.

Questions? Feedback? Please reach out to thePyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment

Reviewers

@janseljanseljansel approved these changes

@dccidccidcci approved these changes

@manuelcandalesmanuelcandalesmanuelcandales approved these changes

Assignees

No one assigned

Labels

ciflow/inductorciflow/mpsRun MPS tests (subset of trunk)Mergedmodule: inductorrelease notes: mpsRelease notes categorytopic: bug fixestopic category

Projects

None yet

Milestone

No milestone

Development

Successfully merging this pull request may close these issues.

6 participants

@malfet@dcci@pytorchmergebot@jansel@manuelcandales

[8]ページ先頭

©2009-2025 Movatter.jp