NotificationsYou must be signed in to change notification settings
Fork26.3k
Star96k

[MPSInductor] Fix multistage reduction suffixes#153362

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Jump to bottom

Closed

malfet wants to merge6 commits intogh/malfet/329/basefromgh/malfet/329/head

Closed

[MPSInductor] Fix multistage reduction suffixes#153362

malfet wants to merge6 commits intogh/malfet/329/basefromgh/malfet/329/head

Conversation

Copy link

Contributor

malfet commentedMay 12, 2025•
edited
Loading

Stack fromghstack (oldest at bottom):

->[MPSInductor] Fix multistage reduction suffixes #153362

By invalidating all variable created during the loop except for the context of iterator_cache, as storage can be done inside reduction loop and clearIteratorRangeEntry codegen cache.

Which results in the following kernel forx / x.sum() if x size is 2048 and max thread group size is 1024

[[max_total_threads_per_threadgroup(1024)]]kernelvoidgenerated_kernel(    device half* out_ptr1,    constant half* in_ptr0,     uint2 thread_pos [[thread_position_in_grid]],    uint2 group_pos [[thread_position_in_threadgroup]]) {auto xindex = thread_pos.x;auto r0_index = thread_pos.y;    threadgroupfloat tmp_acc_0[32];float tmp_acc_1 =0;for(auto r0_0_cnt =0; r0_0_cnt <2; ++r0_0_cnt) {int r0_0 =2 * r0_index + r0_0_cnt;auto tmp0 =static_cast<float>(in_ptr0[r0_0]);        tmp_acc_1 += tmp0;    }auto tmp1 =c10::metal::threadgroup_sum(tmp_acc_0, tmp_acc_1, r0_index *1,1024);for(auto r0_0_cnt =0; r0_0_cnt <2; ++r0_0_cnt) {int r0_0 =2 * r0_index + r0_0_cnt;auto tmp2 =static_cast<float>(in_ptr0[r0_0]);auto tmp3 = tmp2 / tmp1;        out_ptr1[r0_0] =static_cast<half>(tmp3);    } }

Fixes compilation report reported while runningGPUTests.test_pattern_matcher_multi_user_mps andGPUTests.test_weight_norm_bwd_mps

Fixes#152155

Though inductor tests are still failing, need to keep refining the variable invalidation

cc@voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @chenyang78 @kadeng @muchulee8 @amjames @chauhang @aakhundov

Update

7e15c9b

[ghstack-poisoned]

Copy link

pytorch-botbot commentedMay 12, 2025•
edited
Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results athud.pytorch.org/pr/153362

📄 PreviewPython docs built from this PR
📄 PreviewC++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit thebot commands wiki or ouroffice hours

Note: Links to docs will display an error until the docs builds have been completed.

⏳ 66 Pending, 2 Unrelated Failures

As of commit31f175e with merge basee4f2282 ():

FLAKY - The following job failed but was likely due to flakiness present on trunk:

pull / linux-focal-py3.13-clang10 / test (default, 1, 5, linux.4xlarge) (gh) (detected as infra flaky with no log or failing log classifier)

BROKEN TRUNK - The following job failed but was present on the merge base:

👉Rebase onto the `viable/strict` branch to avoid these failures

pull / linux-jammy-py3-clang12-executorch / test (executorch, 1, 1, linux.2xlarge) (gh) (trunk failure)
Process completed with exit code 1.

UNSTABLE - The following jobs are marked as unstable, possibly due to flakiness on trunk:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

pytorch-botbot added ciflow/inductor ciflow/mps

Run MPS tests (subset of trunk)

module: inductor labels

May 12, 2025

malfet requested review fromdcci andjansel

May 12, 2025 03:45

malfet added topic: bug fixes

topic category

release notes: mpsRelease notes category labels

May 12, 2025

malfet added4 commits

May 11, 2025 21:04

Update

a964d84

[ghstack-poisoned]

Update

f04e55c

[ghstack-poisoned]

Update

084efd7

[ghstack-poisoned]

Update

bb83fea

[ghstack-poisoned]

malfet requested a review frommanuelcandales

May 12, 2025 13:37

manuelcandales approved these changes

May 12, 2025

View reviewed changes

Copy link

Member

malfet commentedMay 13, 2025

@pytorchbot merge -f "Lint + MPS are green"

pytorchmergebot added the merging label

May 13, 2025

Copy link

Collaborator

pytorchmergebot commentedMay 13, 2025

Merge started

Your change will be merged immediately since you used the force (-f) flag,bypassing any CI checks (ETA: 1-5 minutes). Please use-f as last resort and instead consider-i/--ignore-current to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.

Learn more about merging in thewiki.

Questions? Feedback? Please reach out to thePyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here