Defined_fused_rms_norm, added MPS-only implementation for it and dispatch fromrms_norm_symint, which is registered asCompositeImplicitAutograd, i.e. it is not supposed to do any computations over Tensor, only dispatch to other ops
Register_fused_rms_norm as a fallback intorch/_inductor/lowering.py
Added unit test to avoid those regressions in the future

TODO:

Get rid of this op, changerms_norm_symint definition toCompositeExplicitAutograd and implement backward function intools/autograd/derivatives.yaml
Benchmark compiler and re-enable decomp as follows when compiled code is faster

@register_decomposition(aten._rms_norm_fused)defrms_norm_fused(self:torch.Tensor,ndim:int,weight:torch.Tensor,eps:float)->torch.Tensor:dtr= [self.dim()-i-1foriinrange(ndim)]returnself*weight* (self.pow(2).mean(dtr,keepdim=True).add(eps).rsqrt())

Fixes#150629

cc@voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @chenyang78 @kadeng @muchulee8 @amjames @chauhang @aakhundov

[MPS] Make fused rmsnorm traceable

51c8cee

By declaring it as an ATen opFixes#150629[ghstack-poisoned]

malfet requested a review fromkulinseth as acode owner

April 4, 2025 00:57

Copy link

pytorch-botbot commentedApr 4, 2025•
edited
Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results athud.pytorch.org/pr/150661

📄 PreviewPython docs built from this PR
📄 PreviewC++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit thebot commands wiki or ouroffice hours

Note: Links to docs will display an error until the docs builds have been completed.

⏳ No Failures, 32 Pending

As of commitb3dd4b4 with merge base300e0ee ():
💚 Looks good so far! There are no failures yet. 💚

UNSTABLE - The following job is marked as unstable, possibly due to flakiness on trunk:

⏳pull / unstable-linux-focal-cuda12.6-py3.10-gcc11-sm89-xfail / build (gh)

This comment was automatically generated by Dr. CI and updates every 15 minutes.

pytorch-botbot added ciflow/mps

Run MPS tests (subset of trunk)

release notes: mpsRelease notes category labels

Apr 4, 2025

Copy link

Contributor

github-actionsbot commentedApr 4, 2025

Attention! native_functions.yaml was changed

If you are adding a new function or defaulted argument to native_functions.yaml, you cannot use it from pre-existing Python frontend code until our FC window passes (two weeks). Split your PR into two PRs, one which adds the new C++ functionality, and one that makes use of it from Python, and land them two weeks apart. Seehttps://github.com/pytorch/pytorch/wiki/PyTorch's-Python-Frontend-Backward-and-Forward-Compatibility-Policy#forwards-compatibility-fc for more info.

Caused by:

aten/src/ATen/native/native_functions.yaml

malfet changed the title~~[MPS] Make fused rmsnorm traceable~~[MPS] Make fused rms_norm traceable

Apr 4, 2025

malfet added the topic: bug fixestopic category label

Apr 4, 2025

malfet requested a review frommanuelcandales

April 4, 2025 01:21

Update on "[MPS] Make fused rms_norm traceable"

97c539d

By declaring it as an ATen opFixes#150629[ghstack-poisoned]

pytorch-botbot added ciflow/inductor module: inductor labels

Apr 4, 2025

malfet requested review fromalbanD andjansel

April 4, 2025 02:06

Copy link

Contributor

kimishpatel commentedApr 4, 2025

Can you add in summary why#150629 introduced a regregssion? Also do we know if inductor generated code will be more performant compared to native kernel?

Update on "[MPS] Make fused rms_norm traceable"

a724a3c

Which is a regression, introduced by#150629 (comment) which I should have reviewed more thoroughly.- Defined `_rms_norm_fused`, added MPS-only implementation for it and dispatch from native::rms_norm_symint there in no-grad mode- Defined a decomp for it in `torch/_inductor/decomposition.py`- Added unit test to avoid those regressions in the futureTODO/Ideas: - Perhaps define it as non-decomposable - Make `torch.compiler.is_compiling` reflect to some sort of  `at::Context` propertyFixes#150629cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov[ghstack-poisoned]

malfet added a commit that referenced this pull request

Apr 4, 2025

[MPS] Make fused rmsnorm traceable

6d6a792

By declaring it as an ATen opFixes#150629ghstack-source-id:1c716d9Pull Requestresolved:#150661

manuelcandales reviewed

Apr 4, 2025

View reviewed changes

torch/_inductor/decomposition.py Outdated

		returnaten.leaky_relu(self,negative_slope),torch.Tensor()


		@register_decomposition(aten._rms_norm_fused)

Copy link

Contributor

manuelcandalesApr 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

We don't want this to happen. We want inductor to use the fused implementation, not to decompose it.

albanD reviewed

Apr 4, 2025

View reviewed changes

aten/src/ATen/native/native_functions.yaml Outdated

		dispatch:
		CompositeImplicitAutograd: rms_norm_symint

		- func: _rms_norm_fused(Tensor input, int normalized_shape_ndim, Tensor weight, float eps) -> Tensor

Copy link

Collaborator

albanDApr 4, 2025•
edited
Loading

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

tbh I think you're making your life more complex for no good reason here.
You can make the op above CompositeExplicitAutograd

Update

daeceef

[ghstack-poisoned]

malfet mentioned this pull request

Apr 16, 2025

[MPS] Migratebitwise_not to unary operator#151460

Closed

manuelcandales reviewed

Apr 16, 2025

View reviewed changes

torch/_inductor/lowering.py OutdatedShow resolvedHide resolved

manuelcandales reviewed

Apr 16, 2025

View reviewed changes

aten/src/ATen/native/native_functions.yaml Outdated

		dispatch:
		CompositeImplicitAutograd: rms_norm_symint

		- func: _rms_norm_fused(Tensor input, int normalized_shape_ndim, Tensor weight, float eps) -> Tensor

Copy link

Contributor

manuelcandalesApr 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I am not sure if there is some naming convention, or if it's just coincidence, but when I search the word "fused" in native_functions.yaml, it is always before the name of the function. If there is such naming convention, then this should be changed to_fused_rms_norm

Update

c9bdc7b

[ghstack-poisoned]

Update

7593c08

[ghstack-poisoned]

Update

f9e70d1

[ghstack-poisoned]

malfet added a commit that referenced this pull request

Apr 17, 2025

[MPS] Make fused rmsnorm traceable

eebbd62

By declaring it as an ATen opFixes#150629ghstack-source-id:a1bbdaePull Requestresolved:#150661

Update on "[MPS] Make fused rms_norm traceable"

d4e1438

Which is a regression, introduced by#150629 (comment) which I should have reviewed more thoroughly.- Defined `_fused_rms_norm`, added MPS-only implementation for it and dispatch from `rms_norm_symint`,  which is registered as `CompositeImplicitAutograd`, i.e. it is not supposed to do any computations over Tensor, only dispatch to other ops- - Register `_fused_rms_norm` as a fallback in `torch/_inductor/lowering.py`- Added unit test to avoid those regressions in the futureTODO:- Get rid of this op, change `rms_norm_symint` definition to `CompositeExplicitAutograd` and implement backward function in `tools/autograd/derivatives.yaml`- Benchmark compiler and re-enable decomp as follows when compiled code is faster```pythonregister_decomposition(aten._rms_norm_fused)def rms_norm_fused(    self: torch.Tensor, ndim: int, weight: torch.Tensor, eps: float) -> torch.Tensor:    dtr = [self.dim() - i - 1 for i in range(ndim)]    return self * weight * (self.pow(2).mean(dtr, keepdim=True).add(eps).rsqrt())```Fixes#150629cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov[ghstack-poisoned]

Copy link

ContributorAuthor

malfet commentedApr 17, 2025

@pytorchbot merge -f "let's test in prod"

pytorchmergebot added the merging label

Apr 17, 2025

Copy link

Collaborator

pytorchmergebot commentedApr 17, 2025

Merge started

Your change will be merged immediately since you used the force (-f) flag,bypassing any CI checks (ETA: 1-5 minutes). Please use-f as last resort and instead consider-i/--ignore-current to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.

Learn more about merging in thewiki.

Questions? Feedback? Please reach out to thePyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot added the Merged label

Apr 17, 2025

pytorchmergebot closed this in682f09e

Apr 17, 2025

pytorchmergebot removed the merging label

Apr 17, 2025

Copy link

ContributorAuthor

malfet commentedApr 17, 2025

@pytorchbot revert -m *Has decomp started to fail again" -c nosignal

Copy link

pytorch-botbot commentedApr 17, 2025

❌ 🤖 pytorchbot command failed:

Got EOF while in a quoted string```Try `@pytorchbot --help` for more info.

Copy link

ContributorAuthor

malfet commentedApr 17, 2025

@pytorchbot revert -m "Has decomp started to fail again" -c nosignal

Copy link

Collaborator

pytorchmergebot commentedApr 17, 2025

@pytorchbot successfully started a revert job. Check the current statushere.
Questions? Feedback? Please reach out to thePyTorch DevX Team

pytorchmergebot added a commit that referenced this pull request

Apr 17, 2025

Revert "[MPS] Make fused rms_norm traceable (#150661)"

e4fe67f

This reverts commit682f09e.Reverted#150661 on behalf ofhttps://github.com/malfet due to Has decomp started to fail again ([comment](#150661 (comment)))

Copy link

Collaborator

pytorchmergebot commentedApr 17, 2025

@malfet your PR has been successfully reverted.

pytorchmergebot added Reverted ci-no-tdDo not run TD on this PR labels

Apr 17, 2025

pytorchmergebot reopened this

Apr 17, 2025

Update on "[MPS] Make fused rms_norm traceable"

b3dd4b4

Which is a regression, introduced by#150629 (comment) which I should have reviewed more thoroughly.- Defined `_fused_rms_norm`, added MPS-only implementation for it and dispatch from `rms_norm_symint`,  which is registered as `CompositeImplicitAutograd`, i.e. it is not supposed to do any computations over Tensor, only dispatch to other ops- - Register `_fused_rms_norm` as a fallback in `torch/_inductor/lowering.py`- Added unit test to avoid those regressions in the futureTODO:- Get rid of this op, change `rms_norm_symint` definition to `CompositeExplicitAutograd` and implement backward function in `tools/autograd/derivatives.yaml`- Benchmark compiler and re-enable decomp as follows when compiled code is faster```pythonregister_decomposition(aten._rms_norm_fused)def rms_norm_fused(    self: torch.Tensor, ndim: int, weight: torch.Tensor, eps: float) -> torch.Tensor:    dtr = [self.dim() - i - 1 for i in range(ndim)]    return self * weight * (self.pow(2).mean(dtr, keepdim=True).add(eps).rsqrt())```Fixes#150629cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov[ghstack-poisoned]

malfet added a commit that referenced this pull request

Apr 17, 2025

[MPS] Make fused rmsnorm traceable

69a94cd

By declaring it as an ATen opFixes#150629ghstack-source-id:de7f161Pull Requestresolved:#150661

Copy link

ContributorAuthor

malfet commentedApr 17, 2025

@pytorchbot merge -f "Re-added has decomposition"

pytorchmergebot added the merging label

Apr 17, 2025

Copy link

Collaborator

pytorchmergebot commentedApr 17, 2025

Merge started

Learn more about merging in thewiki.

Questions? Feedback? Please reach out to thePyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot closed this infa6e842

Apr 17, 2025

pytorchmergebot removed the merging label

Apr 17, 2025

manuelcandales mentioned this pull request

Apr 18, 2025

mps compile: Bump PT pin to April 18thpytorch/torchchat#1531

Merged

github-actionsbot deleted the gh/malfet/266/head branch

May 28, 2025 02:19

Labels

ci-no-td

Do not run TD on this PR

ciflow/inductor ciflow/mps

Run MPS tests (subset of trunk)

Merged module: inductor release notes: mps

Release notes category

Reverted topic: bug fixes

topic category

7 participants

Movatterモバイル変換

[MPS] Make fused rms_norm traceable#150661

[MPS] Make fused rms_norm traceable#150661

Uh oh!

Conversation

malfet commentedApr 4, 2025• editedLoading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

Uh oh!

pytorch-botbot commentedApr 4, 2025• editedLoading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results athud.pytorch.org/pr/150661

⏳ No Failures, 32 Pending

Uh oh!

github-actionsbot commentedApr 4, 2025

Attention! native_functions.yaml was changed

Uh oh!

kimishpatel commentedApr 4, 2025

Uh oh!

manuelcandalesApr 4, 2025

Choose a reason for hiding this comment

Uh oh!

albanDApr 4, 2025• editedLoading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

manuelcandalesApr 16, 2025

Choose a reason for hiding this comment

Uh oh!

malfet commentedApr 17, 2025

Uh oh!

pytorchmergebot commentedApr 17, 2025

Merge started

Uh oh!

malfet commentedApr 17, 2025

Uh oh!

pytorch-botbot commentedApr 17, 2025

Uh oh!

malfet commentedApr 17, 2025

Uh oh!

pytorchmergebot commentedApr 17, 2025

Uh oh!

pytorchmergebot commentedApr 17, 2025

Uh oh!

malfet commentedApr 17, 2025

Uh oh!

pytorchmergebot commentedApr 17, 2025

Merge started

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

malfet commentedApr 4, 2025•
edited
Loading

pytorch-botbot commentedApr 4, 2025•
edited
Loading

albanDApr 4, 2025•
edited
Loading