Movatterモバイル変換

torch/_inductor/utils.py Outdated

		defwrapper(self:Any,args:P.args,*kwargs:P.kwargs)->RV:
		key= (id(self),args,tuple(sorted(kwargs.items())))
		ifkeynotincache:
		cache[key]=fn(self,args,*kwargs)

Copy link

Contributor

desertfireOct 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Looking at howcache_on_self was implemented, I think we should do something similar here to further improve the performance.

eellison reviewed

Oct 28, 2025

Copy link

Contributor

eellison left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Right, flexible layout might change stride, although it's unlikely it would induce a new symbol use. Would it be safer to only cache if the layout is fixed ?

check that FlexibleLayout does not change free symbol uses

7c59352

BoyuanFeng commented

Oct 28, 2025

torch/_inductor/ir.py


		@offset.setter
		defoffset(self,value:Expr)->None:
		self.assert_free_symbol_uses_unchanged("offset",value)

Copy link

ContributorAuthor

BoyuanFengOct 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

error if free symbols are added or deleted after initialization.

BoyuanFeng added6 commits

October 28, 2025 14:03

use exec to make cache faster

dbc76f6

nit

261e6eb

lint

261b4c1

Merge branch 'main' into bf/partition-cache-free-symbols

565ff78

nit

48172eb

store on self.key

f8b2bb5

BoyuanFeng requested review fromdesertfire andeellison

October 30, 2025 00:13

eellison reviewed

Oct 30, 2025

Copy link

Contributor

eellison left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

sorry, one last question - at the point we call it all these nodes should have fixed layout. should we just only cache in the fixed layout case ? i think that will be a bit simpler.

Copy link

Contributor

desertfire commentedOct 30, 2025

sorry, one last question - at the point we call it all these nodes should have fixed layout. should we just only cache in the fixed layout case ? i think that will be a bit simpler.

That is simpler but probably will have performance implication.@BoyuanFeng , I wonder how much performance difference it will be.

Copy link

Contributor

atalman commentedOct 31, 2025

@pytorchmergebot revert -c nosignal -m "Failure: test/nn/test_convolution.py::TestConvolutionNN::test_conv3d_overflow_valuesGH job link HUD commit link"

Copy link

Collaborator

pytorchmergebot commentedOct 31, 2025

@pytorchbot successfully started a revert job. Check the current statushere.
Questions? Feedback? Please reach out to thePyTorch DevX Team

Copy link

Collaborator

pytorchmergebot commentedOct 31, 2025

@BoyuanFeng your PR has been successfully reverted.

pytorchmergebot added a commit that referenced this pull request

Revert "[GraphPartition] cache get_free_symbol_uses (#166338)"

26534e9

This reverts commita6b1ef1.Reverted#166338 on behalf ofhttps://github.com/atalman due to Failure: test/nn/test_convolution.py::TestConvolutionNN::test_conv3d_overflow_values [GH job link](https://github.com/pytorch/pytorch/actions/runs/18961173726/job/54149112920) [HUD commit link](https://hud.pytorch.org/pytorch/pytorch/commit/a6b1ef17173f56ba93ac97ff4384fa4060b5e41e) ([comment](#166338 (comment)))

pytorchmergebot added Reverted ci-no-tdDo not run TD on this PR labels

pytorchmergebot reopened this

Merge branch 'main' into bf/partition-cache-free-symbols

0d670c8

Copy link

ContributorAuthor

BoyuanFeng commentedOct 31, 2025

@atalman the failure is not related to this pr. I also cannot repro locally. Let me rebase and try ci again.

Copy link

ContributorAuthor

BoyuanFeng commentedOct 31, 2025

@pytorchbot merge

pytorchmergebot added the merging label

Copy link

Collaborator

pytorchmergebot commentedOct 31, 2025

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in thewiki.

Questions? Feedback? Please reach out to thePyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot closed this indfebdca

pytorchmergebot removed the merging label

BoyuanFeng added a commit that referenced this pull request

[GraphPartition] cache get_free_symbol_uses (#166338)

8d76dce

Graph partition relies on `get_free_symbol_uses()` to collect symbol inputs.https://github.com/pytorch/pytorch/blob/ee7434be822cf6e75b4566d8159f550ee233d8ae/torch/_inductor/scheduler.py#L4869-L4885I empirically observed that `get_free_symbol_uses()` becomes slower for larger graphs. Specifically, I tried to aten fallback for torchtitan which results in 10k+ aten nodes. When processing the 600-th node, it takes seconds to `get_free_symbol_uses()` for 1 node.Why? Because `get_free_symbol_uses()` may recursively call another `get_free_symbol_uses()`, which could recursively run many times.https://github.com/pytorch/pytorch/blob/ee7434be822cf6e75b4566d8159f550ee233d8ae/torch/_inductor/ir.py#L4541-L4543This PR fixes the issue by caching the results of `get_free_symbol_uses()`. I validated on torchtitan that the issue is fixed.Pull Requestresolved:#166338Approved by:https://github.com/eellison

BoyuanFeng pushed a commit that referenced this pull request

Revert "[GraphPartition] cache get_free_symbol_uses (#166338)"

bc38dbd

This reverts commita6b1ef1.Reverted#166338 on behalf ofhttps://github.com/atalman due to Failure: test/nn/test_convolution.py::TestConvolutionNN::test_conv3d_overflow_values [GH job link](https://github.com/pytorch/pytorch/actions/runs/18961173726/job/54149112920) [HUD commit link](https://hud.pytorch.org/pytorch/pytorch/commit/a6b1ef17173f56ba93ac97ff4384fa4060b5e41e) ([comment](#166338 (comment)))

BoyuanFeng added a commit that referenced this pull request

[GraphPartition] cache get_free_symbol_uses (#166338)

25f8875

Graph partition relies on `get_free_symbol_uses()` to collect symbol inputs.https://github.com/pytorch/pytorch/blob/ee7434be822cf6e75b4566d8159f550ee233d8ae/torch/_inductor/scheduler.py#L4869-L4885I empirically observed that `get_free_symbol_uses()` becomes slower for larger graphs. Specifically, I tried to aten fallback for torchtitan which results in 10k+ aten nodes. When processing the 600-th node, it takes seconds to `get_free_symbol_uses()` for 1 node.Why? Because `get_free_symbol_uses()` may recursively call another `get_free_symbol_uses()`, which could recursively run many times.https://github.com/pytorch/pytorch/blob/ee7434be822cf6e75b4566d8159f550ee233d8ae/torch/_inductor/ir.py#L4541-L4543This PR fixes the issue by caching the results of `get_free_symbol_uses()`. I validated on torchtitan that the issue is fixed.Pull Requestresolved:#166338Approved by:https://github.com/eellison

etaf pushed a commit to etaf/pytorch-inductor-xpu that referenced this pull request

[GraphPartition] cache get_free_symbol_uses (pytorch#166338)

0d600c8

Graph partition relies on `get_free_symbol_uses()` to collect symbol inputs.https://github.com/pytorch/pytorch/blob/ee7434be822cf6e75b4566d8159f550ee233d8ae/torch/_inductor/scheduler.py#L4869-L4885I empirically observed that `get_free_symbol_uses()` becomes slower for larger graphs. Specifically, I tried to aten fallback for torchtitan which results in 10k+ aten nodes. When processing the 600-th node, it takes seconds to `get_free_symbol_uses()` for 1 node.Why? Because `get_free_symbol_uses()` may recursively call another `get_free_symbol_uses()`, which could recursively run many times.https://github.com/pytorch/pytorch/blob/ee7434be822cf6e75b4566d8159f550ee233d8ae/torch/_inductor/ir.py#L4541-L4543This PR fixes the issue by caching the results of `get_free_symbol_uses()`. I validated on torchtitan that the issue is fixed.Pull Requestresolved:pytorch#166338Approved by:https://github.com/eellison

etaf pushed a commit to etaf/pytorch-inductor-xpu that referenced this pull request

Revert "[GraphPartition] cache get_free_symbol_uses (pytorch#166338)"

314d1c1

This reverts commita6b1ef1.Revertedpytorch#166338 on behalf ofhttps://github.com/atalman due to Failure: test/nn/test_convolution.py::TestConvolutionNN::test_conv3d_overflow_values [GH job link](https://github.com/pytorch/pytorch/actions/runs/18961173726/job/54149112920) [HUD commit link](https://hud.pytorch.org/pytorch/pytorch/commit/a6b1ef17173f56ba93ac97ff4384fa4060b5e41e) ([comment](pytorch#166338 (comment)))

etaf pushed a commit to etaf/pytorch-inductor-xpu that referenced this pull request

[GraphPartition] cache get_free_symbol_uses (pytorch#166338)

aebc2d5

Graph partition relies on `get_free_symbol_uses()` to collect symbol inputs.https://github.com/pytorch/pytorch/blob/ee7434be822cf6e75b4566d8159f550ee233d8ae/torch/_inductor/scheduler.py#L4869-L4885I empirically observed that `get_free_symbol_uses()` becomes slower for larger graphs. Specifically, I tried to aten fallback for torchtitan which results in 10k+ aten nodes. When processing the 600-th node, it takes seconds to `get_free_symbol_uses()` for 1 node.Why? Because `get_free_symbol_uses()` may recursively call another `get_free_symbol_uses()`, which could recursively run many times.https://github.com/pytorch/pytorch/blob/ee7434be822cf6e75b4566d8159f550ee233d8ae/torch/_inductor/ir.py#L4541-L4543This PR fixes the issue by caching the results of `get_free_symbol_uses()`. I validated on torchtitan that the issue is fixed.Pull Requestresolved:pytorch#166338Approved by:https://github.com/eellison

Copy link

ContributorAuthor

BoyuanFeng commentedNov 4, 2025

@pytorchbot cherry-pick --onto release/2.9 --fixes "Inductor partition compilation infinite hang issue introduced in 2.9.0 breaking torchtitan" -c fixnewfeature

Copy link

Collaborator

pytorchbot commentedNov 4, 2025

Cherry picking#166338

Commandgit -C /home/runner/work/pytorch/pytorch cherry-pick -x dfebdcab86acbaa0eaa996b47595e5f27a66492e returned non-zero exit code 1

Auto-merging test/inductor/test_torchinductor.pyAuto-merging torch/_inductor/ir.pyCONFLICT (content): Merge conflict in torch/_inductor/ir.pyAuto-merging torch/_inductor/utils.pyCONFLICT (content): Merge conflict in torch/_inductor/utils.pyerror: could not apply dfebdcab86a... [GraphPartition] cache get_free_symbol_uses (#166338)hint: After resolving the conflicts, mark them withhint: "git add/rm <pathspec>", then runhint: "git cherry-pick --continue".hint: You can instead skip this commit with "git cherry-pick --skip".hint: To abort and get back to the state before "git cherry-pick",hint: run "git cherry-pick --abort".hint: Disable this message with "git config set advice.mergeConflict false"

Details for Dev Infra team

Raised byworkflow job

Lucaskabela pushed a commit that referenced this pull request

[GraphPartition] cache get_free_symbol_uses (#166338)#166994

[GraphPartition] cache get_free_symbol_uses (#166338)

ffb2f3b

Graph partition relies on `get_free_symbol_uses()` to collect symbol inputs.https://github.com/pytorch/pytorch/blob/ee7434be822cf6e75b4566d8159f550ee233d8ae/torch/_inductor/scheduler.py#L4869-L4885I empirically observed that `get_free_symbol_uses()` becomes slower for larger graphs. Specifically, I tried to aten fallback for torchtitan which results in 10k+ aten nodes. When processing the 600-th node, it takes seconds to `get_free_symbol_uses()` for 1 node.Why? Because `get_free_symbol_uses()` may recursively call another `get_free_symbol_uses()`, which could recursively run many times.https://github.com/pytorch/pytorch/blob/ee7434be822cf6e75b4566d8159f550ee233d8ae/torch/_inductor/ir.py#L4541-L4543This PR fixes the issue by caching the results of `get_free_symbol_uses()`. I validated on torchtitan that the issue is fixed.Pull Requestresolved:#166338Approved by:https://github.com/eellison(cherry picked from commitdfebdca)

This was referencedNov 4, 2025

Merged

[v2.9.1] Release Tracker#166758

Closed

Lucaskabela pushed a commit that referenced this pull request

[GraphPartition] cache get_free_symbol_uses (#166338)

bed779a

Graph partition relies on `get_free_symbol_uses()` to collect symbol inputs.https://github.com/pytorch/pytorch/blob/ee7434be822cf6e75b4566d8159f550ee233d8ae/torch/_inductor/scheduler.py#L4869-L4885I empirically observed that `get_free_symbol_uses()` becomes slower for larger graphs. Specifically, I tried to aten fallback for torchtitan which results in 10k+ aten nodes. When processing the 600-th node, it takes seconds to `get_free_symbol_uses()` for 1 node.Why? Because `get_free_symbol_uses()` may recursively call another `get_free_symbol_uses()`, which could recursively run many times.https://github.com/pytorch/pytorch/blob/ee7434be822cf6e75b4566d8159f550ee233d8ae/torch/_inductor/ir.py#L4541-L4543This PR fixes the issue by caching the results of `get_free_symbol_uses()`. I validated on torchtitan that the issue is fixed.Pull Requestresolved:#166338Approved by:https://github.com/eellison(cherry picked from commitdfebdca)

Lucaskabela pushed a commit that referenced this pull request