Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

[GraphPartition] cache get_free_symbol_uses#166338

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Closed
BoyuanFeng wants to merge14 commits intomainfrombf/partition-cache-free-symbols

Conversation

@BoyuanFeng
Copy link
Contributor

@BoyuanFengBoyuanFeng commentedOct 27, 2025
edited by pytorch-botbot
Loading

Graph partition relies onget_free_symbol_uses() to collect symbol inputs.

defget_scheduler_node_symbol_uses(
node:BaseSchedulerNode,
)->OrderedSet[sympy.Symbol]:
"""
Gets symbols used in node.
"""
ifisinstance(node,FusedSchedulerNode):
returnOrderedSet().union(
*(get_scheduler_node_symbol_uses(snode)forsnodeinnode.snodes)
)
assertnode.nodeisnotNone
free_symbol_uses=node.node.get_free_symbol_uses()
free_symbol_uses.update(
*(get_layout_symints(ir_node)forir_nodeinnode.node.get_outputs())
)
returnfree_symbol_uses

I empirically observed thatget_free_symbol_uses() becomes slower for larger graphs. Specifically, I tried to aten fallback for torchtitan which results in 10k+ aten nodes. When processing the 600-th node, it takes seconds toget_free_symbol_uses() for 1 node.

Why? Becauseget_free_symbol_uses() may recursively call anotherget_free_symbol_uses(), which could recursively run many times.

result=self.layout.get_free_symbol_uses(
unbacked_only
)|self.data.get_free_symbol_uses(unbacked_only)

This PR fixes the issue by caching the results ofget_free_symbol_uses(). I validated on torchtitan that the issue is fixed.

cc@voznesenskym@penguinwu@EikanWang@jgong5@Guobing-Chen@XiaobingSuper@zhuhaozhe@blzheng@wenzhe-nrv@jiayisunx@ipiszy@chenyang78@kadeng@muchulee8@amjames@chauhang@aakhundov@coconutruben

@BoyuanFengBoyuanFeng added ciflow/trunkTrigger trunk jobs on your pull request topic: not user facingtopic category module: inductor labelsOct 27, 2025
@pytorch-bot
Copy link

pytorch-botbot commentedOct 27, 2025
edited
Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results athud.pytorch.org/pr/166338

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit0d670c8 with merge base365ed62 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@laithsakka
Copy link
Contributor

laithsakka commentedOct 28, 2025
edited
Loading

seems reasonable are inductor nodes immutable?@eellison
if NOT wonder if we can do this optimization in a more safe way, within a context that i know that the nodes are not changing i can cache ? that would be dependent on the torchtitan case.

defwrapper(self:Any,*args:P.args,**kwargs:P.kwargs)->RV:
key= (id(self),args,tuple(sorted(kwargs.items())))
ifkeynotincache:
cache[key]=fn(self,*args,**kwargs)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Looking at howcache_on_self was implemented, I think we should do something similar here to further improve the performance.

BoyuanFeng reacted with thumbs up emoji
Copy link
Contributor

@eellisoneellison left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Right, flexible layout might change stride, although it's unlikely it would induce a new symbol use. Would it be safer to only cache if the layout is fixed ?


@offset.setter
defoffset(self,value:Expr)->None:
self.assert_free_symbol_uses_unchanged("offset",value)
Copy link
ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

error if free symbols are added or deleted after initialization.

Copy link
Contributor

@eellisoneellison left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

sorry, one last question - at the point we call it all these nodes should have fixed layout. should we just only cache in the fixed layout case ? i think that will be a bit simpler.

@desertfire
Copy link
Contributor

sorry, one last question - at the point we call it all these nodes should have fixed layout. should we just only cache in the fixed layout case ? i think that will be a bit simpler.

That is simpler but probably will have performance implication.@BoyuanFeng , I wonder how much performance difference it will be.

@atalman
Copy link
Contributor

@pytorchmergebot revert -c nosignal -m "Failure: test/nn/test_convolution.py::TestConvolutionNN::test_conv3d_overflow_valuesGH job linkHUD commit link"

pytorch-bot[bot] reacted with thumbs up emoji

@pytorchmergebot
Copy link
Collaborator

@pytorchbot successfully started a revert job. Check the current statushere.
Questions? Feedback? Please reach out to thePyTorch DevX Team

@pytorchmergebot
Copy link
Collaborator

@BoyuanFeng your PR has been successfully reverted.

pytorchmergebot added a commit that referenced this pull requestOct 31, 2025
@pytorchmergebotpytorchmergebot added Reverted ci-no-tdDo not run TD on this PR labelsOct 31, 2025
@BoyuanFeng
Copy link
ContributorAuthor

@atalman the failure is not related to this pr. I also cannot repro locally. Let me rebase and try ci again.

image

@BoyuanFeng
Copy link
ContributorAuthor

@pytorchbot merge

pytorch-bot[bot] reacted with thumbs up emoji

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in thewiki.

Questions? Feedback? Please reach out to thePyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

BoyuanFeng added a commit that referenced this pull requestOct 31, 2025
Graph partition relies on `get_free_symbol_uses()` to collect symbol inputs.https://github.com/pytorch/pytorch/blob/ee7434be822cf6e75b4566d8159f550ee233d8ae/torch/_inductor/scheduler.py#L4869-L4885I empirically observed that `get_free_symbol_uses()` becomes slower for larger graphs. Specifically, I tried to aten fallback for torchtitan which results in 10k+ aten nodes. When processing the 600-th node, it takes seconds to `get_free_symbol_uses()` for 1 node.Why? Because `get_free_symbol_uses()` may recursively call another `get_free_symbol_uses()`, which could recursively run many times.https://github.com/pytorch/pytorch/blob/ee7434be822cf6e75b4566d8159f550ee233d8ae/torch/_inductor/ir.py#L4541-L4543This PR fixes the issue by caching the results of `get_free_symbol_uses()`. I validated on torchtitan that the issue is fixed.Pull Requestresolved:#166338Approved by:https://github.com/eellison
BoyuanFeng pushed a commit that referenced this pull requestOct 31, 2025
BoyuanFeng added a commit that referenced this pull requestOct 31, 2025
Graph partition relies on `get_free_symbol_uses()` to collect symbol inputs.https://github.com/pytorch/pytorch/blob/ee7434be822cf6e75b4566d8159f550ee233d8ae/torch/_inductor/scheduler.py#L4869-L4885I empirically observed that `get_free_symbol_uses()` becomes slower for larger graphs. Specifically, I tried to aten fallback for torchtitan which results in 10k+ aten nodes. When processing the 600-th node, it takes seconds to `get_free_symbol_uses()` for 1 node.Why? Because `get_free_symbol_uses()` may recursively call another `get_free_symbol_uses()`, which could recursively run many times.https://github.com/pytorch/pytorch/blob/ee7434be822cf6e75b4566d8159f550ee233d8ae/torch/_inductor/ir.py#L4541-L4543This PR fixes the issue by caching the results of `get_free_symbol_uses()`. I validated on torchtitan that the issue is fixed.Pull Requestresolved:#166338Approved by:https://github.com/eellison
etaf pushed a commit to etaf/pytorch-inductor-xpu that referenced this pull requestNov 4, 2025
Graph partition relies on `get_free_symbol_uses()` to collect symbol inputs.https://github.com/pytorch/pytorch/blob/ee7434be822cf6e75b4566d8159f550ee233d8ae/torch/_inductor/scheduler.py#L4869-L4885I empirically observed that `get_free_symbol_uses()` becomes slower for larger graphs. Specifically, I tried to aten fallback for torchtitan which results in 10k+ aten nodes. When processing the 600-th node, it takes seconds to `get_free_symbol_uses()` for 1 node.Why? Because `get_free_symbol_uses()` may recursively call another `get_free_symbol_uses()`, which could recursively run many times.https://github.com/pytorch/pytorch/blob/ee7434be822cf6e75b4566d8159f550ee233d8ae/torch/_inductor/ir.py#L4541-L4543This PR fixes the issue by caching the results of `get_free_symbol_uses()`. I validated on torchtitan that the issue is fixed.Pull Requestresolved:pytorch#166338Approved by:https://github.com/eellison
etaf pushed a commit to etaf/pytorch-inductor-xpu that referenced this pull requestNov 4, 2025
etaf pushed a commit to etaf/pytorch-inductor-xpu that referenced this pull requestNov 4, 2025
Graph partition relies on `get_free_symbol_uses()` to collect symbol inputs.https://github.com/pytorch/pytorch/blob/ee7434be822cf6e75b4566d8159f550ee233d8ae/torch/_inductor/scheduler.py#L4869-L4885I empirically observed that `get_free_symbol_uses()` becomes slower for larger graphs. Specifically, I tried to aten fallback for torchtitan which results in 10k+ aten nodes. When processing the 600-th node, it takes seconds to `get_free_symbol_uses()` for 1 node.Why? Because `get_free_symbol_uses()` may recursively call another `get_free_symbol_uses()`, which could recursively run many times.https://github.com/pytorch/pytorch/blob/ee7434be822cf6e75b4566d8159f550ee233d8ae/torch/_inductor/ir.py#L4541-L4543This PR fixes the issue by caching the results of `get_free_symbol_uses()`. I validated on torchtitan that the issue is fixed.Pull Requestresolved:pytorch#166338Approved by:https://github.com/eellison
@BoyuanFeng
Copy link
ContributorAuthor

@pytorchbot cherry-pick --onto release/2.9 --fixes "Inductor partition compilation infinite hang issue introduced in 2.9.0 breaking torchtitan" -c fixnewfeature

pytorch-bot[bot] reacted with thumbs up emoji

@pytorchbot
Copy link
Collaborator

Cherry picking#166338

Commandgit -C /home/runner/work/pytorch/pytorch cherry-pick -x dfebdcab86acbaa0eaa996b47595e5f27a66492e returned non-zero exit code 1

Auto-merging test/inductor/test_torchinductor.pyAuto-merging torch/_inductor/ir.pyCONFLICT (content): Merge conflict in torch/_inductor/ir.pyAuto-merging torch/_inductor/utils.pyCONFLICT (content): Merge conflict in torch/_inductor/utils.pyerror: could not apply dfebdcab86a... [GraphPartition] cache get_free_symbol_uses (#166338)hint: After resolving the conflicts, mark them withhint: "git add/rm <pathspec>", then runhint: "git cherry-pick --continue".hint: You can instead skip this commit with "git cherry-pick --skip".hint: To abort and get back to the state before "git cherry-pick",hint: run "git cherry-pick --abort".hint: Disable this message with "git config set advice.mergeConflict false"
Details for Dev Infra teamRaised byworkflow job

Lucaskabela pushed a commit that referenced this pull requestNov 4, 2025
Graph partition relies on `get_free_symbol_uses()` to collect symbol inputs.https://github.com/pytorch/pytorch/blob/ee7434be822cf6e75b4566d8159f550ee233d8ae/torch/_inductor/scheduler.py#L4869-L4885I empirically observed that `get_free_symbol_uses()` becomes slower for larger graphs. Specifically, I tried to aten fallback for torchtitan which results in 10k+ aten nodes. When processing the 600-th node, it takes seconds to `get_free_symbol_uses()` for 1 node.Why? Because `get_free_symbol_uses()` may recursively call another `get_free_symbol_uses()`, which could recursively run many times.https://github.com/pytorch/pytorch/blob/ee7434be822cf6e75b4566d8159f550ee233d8ae/torch/_inductor/ir.py#L4541-L4543This PR fixes the issue by caching the results of `get_free_symbol_uses()`. I validated on torchtitan that the issue is fixed.Pull Requestresolved:#166338Approved by:https://github.com/eellison(cherry picked from commitdfebdca)
Lucaskabela pushed a commit that referenced this pull requestNov 4, 2025
Graph partition relies on `get_free_symbol_uses()` to collect symbol inputs.https://github.com/pytorch/pytorch/blob/ee7434be822cf6e75b4566d8159f550ee233d8ae/torch/_inductor/scheduler.py#L4869-L4885I empirically observed that `get_free_symbol_uses()` becomes slower for larger graphs. Specifically, I tried to aten fallback for torchtitan which results in 10k+ aten nodes. When processing the 600-th node, it takes seconds to `get_free_symbol_uses()` for 1 node.Why? Because `get_free_symbol_uses()` may recursively call another `get_free_symbol_uses()`, which could recursively run many times.https://github.com/pytorch/pytorch/blob/ee7434be822cf6e75b4566d8159f550ee233d8ae/torch/_inductor/ir.py#L4541-L4543This PR fixes the issue by caching the results of `get_free_symbol_uses()`. I validated on torchtitan that the issue is fixed.Pull Requestresolved:#166338Approved by:https://github.com/eellison(cherry picked from commitdfebdca)
Lucaskabela pushed a commit that referenced this pull requestNov 4, 2025
Graph partition relies on `get_free_symbol_uses()` to collect symbol inputs.https://github.com/pytorch/pytorch/blob/ee7434be822cf6e75b4566d8159f550ee233d8ae/torch/_inductor/scheduler.py#L4869-L4885I empirically observed that `get_free_symbol_uses()` becomes slower for larger graphs. Specifically, I tried to aten fallback for torchtitan which results in 10k+ aten nodes. When processing the 600-th node, it takes seconds to `get_free_symbol_uses()` for 1 node.Why? Because `get_free_symbol_uses()` may recursively call another `get_free_symbol_uses()`, which could recursively run many times.https://github.com/pytorch/pytorch/blob/ee7434be822cf6e75b4566d8159f550ee233d8ae/torch/_inductor/ir.py#L4541-L4543This PR fixes the issue by caching the results of `get_free_symbol_uses()`. I validated on torchtitan that the issue is fixed.Pull Requestresolved:#166338Approved by:https://github.com/eellison(cherry picked from commitdfebdca)
atalman pushed a commit that referenced this pull requestNov 6, 2025
Graph partition relies on `get_free_symbol_uses()` to collect symbol inputs.https://github.com/pytorch/pytorch/blob/ee7434be822cf6e75b4566d8159f550ee233d8ae/torch/_inductor/scheduler.py#L4869-L4885I empirically observed that `get_free_symbol_uses()` becomes slower for larger graphs. Specifically, I tried to aten fallback for torchtitan which results in 10k+ aten nodes. When processing the 600-th node, it takes seconds to `get_free_symbol_uses()` for 1 node.Why? Because `get_free_symbol_uses()` may recursively call another `get_free_symbol_uses()`, which could recursively run many times.https://github.com/pytorch/pytorch/blob/ee7434be822cf6e75b4566d8159f550ee233d8ae/torch/_inductor/ir.py#L4541-L4543This PR fixes the issue by caching the results of `get_free_symbol_uses()`. I validated on torchtitan that the issue is fixed.Pull Requestresolved:#166338Approved by:https://github.com/eellison(cherry picked from commitdfebdca)Co-authored-by: Boyuan Feng <boyuan@meta.com>
@github-actionsgithub-actionsbot deleted the bf/partition-cache-free-symbols branchDecember 5, 2025 02:18
Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment

Reviewers

@eellisoneellisoneellison approved these changes

@laithsakkalaithsakkaAwaiting requested review from laithsakka

@desertfiredesertfireAwaiting requested review from desertfire

Assignees

No one assigned

Labels

ci-no-tdDo not run TD on this PRciflow/inductorciflow/trunkTrigger trunk jobs on your pull requestMergedmodule: inductorRevertedtopic: not user facingtopic category

Projects

None yet

Milestone

No milestone

Development

Successfully merging this pull request may close these issues.

8 participants

@BoyuanFeng@laithsakka@desertfire@eellison@pytorchmergebot@atalman@pytorchbot

[8]ページ先頭

©2009-2025 Movatter.jp