Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

[MPS] Make fused rms_norm traceable#150661

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Closed
malfet wants to merge9 commits intogh/malfet/266/basefromgh/malfet/266/head

Conversation

@malfet
Copy link
Contributor

@malfetmalfet commentedApr 4, 2025
edited
Loading

Stack fromghstack (oldest at bottom):

Which is a regression, introduced by#150629 (comment) which I should have reviewed more thoroughly.

  • Defined_fused_rms_norm, added MPS-only implementation for it and dispatch fromrms_norm_symint, which is registered asCompositeImplicitAutograd, i.e. it is not supposed to do any computations over Tensor, only dispatch to other ops
  • Register_fused_rms_norm as a fallback intorch/_inductor/lowering.py
  • Added unit test to avoid those regressions in the future

TODO:

  • Get rid of this op, changerms_norm_symint definition toCompositeExplicitAutograd and implement backward function intools/autograd/derivatives.yaml
  • Benchmark compiler and re-enable decomp as follows when compiled code is faster
@register_decomposition(aten._rms_norm_fused)defrms_norm_fused(self:torch.Tensor,ndim:int,weight:torch.Tensor,eps:float)->torch.Tensor:dtr= [self.dim()-i-1foriinrange(ndim)]returnself*weight* (self.pow(2).mean(dtr,keepdim=True).add(eps).rsqrt())

Fixes#150629

cc@voznesenskym@penguinwu@EikanWang@jgong5@Guobing-Chen@XiaobingSuper@zhuhaozhe@blzheng@wenzhe-nrv@jiayisunx@ipiszy@chenyang78@kadeng@muchulee8@amjames@chauhang@aakhundov

By declaring it as an ATen opFixes#150629[ghstack-poisoned]
@pytorch-bot
Copy link

pytorch-botbot commentedApr 4, 2025
edited
Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results athud.pytorch.org/pr/150661

Note: Links to docs will display an error until the docs builds have been completed.

⏳ No Failures, 32 Pending

As of commitb3dd4b4 with merge base300e0ee (image):
💚 Looks good so far! There are no failures yet. 💚

UNSTABLE - The following job is marked as unstable, possibly due to flakiness on trunk:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@pytorch-botpytorch-botbot added ciflow/mpsRun MPS tests (subset of trunk) release notes: mpsRelease notes category labelsApr 4, 2025
@github-actions
Copy link
Contributor

Attention! native_functions.yaml was changed

If you are adding a new function or defaulted argument to native_functions.yaml, you cannot use it from pre-existing Python frontend code until our FC window passes (two weeks). Split your PR into two PRs, one which adds the new C++ functionality, and one that makes use of it from Python, and land them two weeks apart. Seehttps://github.com/pytorch/pytorch/wiki/PyTorch's-Python-Frontend-Backward-and-Forward-Compatibility-Policy#forwards-compatibility-fc for more info.


Caused by:

@malfetmalfet changed the title[MPS] Make fused rmsnorm traceable[MPS] Make fused rms_norm traceableApr 4, 2025
@malfetmalfet added the topic: bug fixestopic category labelApr 4, 2025
By declaring it as an ATen opFixes#150629[ghstack-poisoned]
@kimishpatel
Copy link
Contributor

Can you add in summary why#150629 introduced a regregssion? Also do we know if inductor generated code will be more performant compared to native kernel?

malfet reacted with thumbs up emoji

Which is a regression, introduced by#150629 (comment) which I should have reviewed more thoroughly.- Defined `_rms_norm_fused`, added MPS-only implementation for it and dispatch from native::rms_norm_symint there in no-grad mode- Defined a decomp for it in `torch/_inductor/decomposition.py`- Added unit test to avoid those regressions in the futureTODO/Ideas: - Perhaps define it as non-decomposable - Make `torch.compiler.is_compiling` reflect to some sort of  `at::Context` propertyFixes#150629cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov[ghstack-poisoned]
malfet added a commit that referenced this pull requestApr 4, 2025
By declaring it as an ATen opFixes#150629ghstack-source-id:1c716d9Pull Requestresolved:#150661
returnaten.leaky_relu(self,negative_slope),torch.Tensor()


@register_decomposition(aten._rms_norm_fused)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

We don't want this to happen. We want inductor to use the fused implementation, not to decompose it.

dispatch:
CompositeImplicitAutograd: rms_norm_symint

- func: _rms_norm_fused(Tensor input, int normalized_shape_ndim, Tensor weight, float eps) -> Tensor
Copy link
Collaborator

@albanDalbanDApr 4, 2025
edited
Loading

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

tbh I think you're making your life more complex for no good reason here.
You can make the op above CompositeExplicitAutograd

[ghstack-poisoned]
dispatch:
CompositeImplicitAutograd: rms_norm_symint

- func: _rms_norm_fused(Tensor input, int normalized_shape_ndim, Tensor weight, float eps) -> Tensor
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I am not sure if there is some naming convention, or if it's just coincidence, but when I search the word "fused" in native_functions.yaml, it is always before the name of the function. If there is such naming convention, then this should be changed to_fused_rms_norm

malfet reacted with thumbs up emoji
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
malfet added a commit that referenced this pull requestApr 17, 2025
By declaring it as an ATen opFixes#150629ghstack-source-id:a1bbdaePull Requestresolved:#150661
Which is a regression, introduced by#150629 (comment) which I should have reviewed more thoroughly.- Defined `_fused_rms_norm`, added MPS-only implementation for it and dispatch from `rms_norm_symint`,  which is registered as `CompositeImplicitAutograd`, i.e. it is not supposed to do any computations over Tensor, only dispatch to other ops- - Register `_fused_rms_norm` as a fallback in `torch/_inductor/lowering.py`- Added unit test to avoid those regressions in the futureTODO:- Get rid of this op, change `rms_norm_symint` definition to `CompositeExplicitAutograd` and implement backward function in `tools/autograd/derivatives.yaml`- Benchmark compiler and re-enable decomp as follows when compiled code is faster```pythonregister_decomposition(aten._rms_norm_fused)def rms_norm_fused(    self: torch.Tensor, ndim: int, weight: torch.Tensor, eps: float) -> torch.Tensor:    dtr = [self.dim() - i - 1 for i in range(ndim)]    return self * weight * (self.pow(2).mean(dtr, keepdim=True).add(eps).rsqrt())```Fixes#150629cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov[ghstack-poisoned]
@malfet
Copy link
ContributorAuthor

@pytorchbot merge -f "let's test in prod"

pytorch-bot[bot] reacted with thumbs up emoji

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged immediately since you used the force (-f) flag,bypassing any CI checks (ETA: 1-5 minutes). Please use-f as last resort and instead consider-i/--ignore-current to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.

Learn more about merging in thewiki.

Questions? Feedback? Please reach out to thePyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@malfet
Copy link
ContributorAuthor

@pytorchbot revert -m *Has decomp started to fail again" -c nosignal

@pytorch-bot
Copy link

❌ 🤖 pytorchbot command failed:

Got EOF while in a quoted string```Try `@pytorchbot --help` for more info.

@malfet
Copy link
ContributorAuthor

@pytorchbot revert -m "Has decomp started to fail again" -c nosignal

pytorch-bot[bot] reacted with thumbs up emoji

@pytorchmergebot
Copy link
Collaborator

@pytorchbot successfully started a revert job. Check the current statushere.
Questions? Feedback? Please reach out to thePyTorch DevX Team

pytorchmergebot added a commit that referenced this pull requestApr 17, 2025
This reverts commit682f09e.Reverted#150661 on behalf ofhttps://github.com/malfet due to Has decomp started to fail again ([comment](#150661 (comment)))
@pytorchmergebot
Copy link
Collaborator

@malfet your PR has been successfully reverted.

@pytorchmergebotpytorchmergebot added Reverted ci-no-tdDo not run TD on this PR labelsApr 17, 2025
Which is a regression, introduced by#150629 (comment) which I should have reviewed more thoroughly.- Defined `_fused_rms_norm`, added MPS-only implementation for it and dispatch from `rms_norm_symint`,  which is registered as `CompositeImplicitAutograd`, i.e. it is not supposed to do any computations over Tensor, only dispatch to other ops- - Register `_fused_rms_norm` as a fallback in `torch/_inductor/lowering.py`- Added unit test to avoid those regressions in the futureTODO:- Get rid of this op, change `rms_norm_symint` definition to `CompositeExplicitAutograd` and implement backward function in `tools/autograd/derivatives.yaml`- Benchmark compiler and re-enable decomp as follows when compiled code is faster```pythonregister_decomposition(aten._rms_norm_fused)def rms_norm_fused(    self: torch.Tensor, ndim: int, weight: torch.Tensor, eps: float) -> torch.Tensor:    dtr = [self.dim() - i - 1 for i in range(ndim)]    return self * weight * (self.pow(2).mean(dtr, keepdim=True).add(eps).rsqrt())```Fixes#150629cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov[ghstack-poisoned]
malfet added a commit that referenced this pull requestApr 17, 2025
By declaring it as an ATen opFixes#150629ghstack-source-id:de7f161Pull Requestresolved:#150661
@malfet
Copy link
ContributorAuthor

@pytorchbot merge -f "Re-added has decomposition"

pytorch-bot[bot] reacted with thumbs up emoji

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged immediately since you used the force (-f) flag,bypassing any CI checks (ETA: 1-5 minutes). Please use-f as last resort and instead consider-i/--ignore-current to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.

Learn more about merging in thewiki.

Questions? Feedback? Please reach out to thePyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment

Reviewers

@albanDalbanDalbanD left review comments

@janseljanseljansel approved these changes

@manuelcandalesmanuelcandalesmanuelcandales approved these changes

@kulinsethkulinsethAwaiting requested review from kulinseth

Assignees

No one assigned

Labels

ci-no-tdDo not run TD on this PRciflow/inductorciflow/mpsRun MPS tests (subset of trunk)Mergedmodule: inductorrelease notes: mpsRelease notes categoryRevertedtopic: bug fixestopic category

Projects

None yet

Milestone

No milestone

Development

Successfully merging this pull request may close these issues.

7 participants

@malfet@kimishpatel@pytorchmergebot@jansel@albanD@manuelcandales

[8]ページ先頭

©2009-2025 Movatter.jp