- Notifications
You must be signed in to change notification settings - Fork26.3k
[GraphPartition] cache get_free_symbol_uses#166338
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
Uh oh!
There was an error while loading.Please reload this page.
Conversation
pytorch-botbot commentedOct 27, 2025 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
🔗 Helpful Links🧪 See artifacts and rendered test results athud.pytorch.org/pr/166338
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit0d670c8 with merge base365ed62 ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
laithsakka commentedOct 28, 2025 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
seems reasonable are inductor nodes immutable?@eellison |
torch/_inductor/utils.py Outdated
| defwrapper(self:Any,*args:P.args,**kwargs:P.kwargs)->RV: | ||
| key= (id(self),args,tuple(sorted(kwargs.items()))) | ||
| ifkeynotincache: | ||
| cache[key]=fn(self,*args,**kwargs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Looking at howcache_on_self was implemented, I think we should do something similar here to further improve the performance.
eellison left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Right, flexible layout might change stride, although it's unlikely it would induce a new symbol use. Would it be safer to only cache if the layout is fixed ?
| @offset.setter | ||
| defoffset(self,value:Expr)->None: | ||
| self.assert_free_symbol_uses_unchanged("offset",value) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
error if free symbols are added or deleted after initialization.
eellison left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
sorry, one last question - at the point we call it all these nodes should have fixed layout. should we just only cache in the fixed layout case ? i think that will be a bit simpler.
desertfire commentedOct 30, 2025
That is simpler but probably will have performance implication.@BoyuanFeng , I wonder how much performance difference it will be. |
atalman commentedOct 31, 2025
@pytorchmergebot revert -c nosignal -m "Failure: test/nn/test_convolution.py::TestConvolutionNN::test_conv3d_overflow_valuesGH job linkHUD commit link" |
pytorchmergebot commentedOct 31, 2025
@pytorchbot successfully started a revert job. Check the current statushere. |
pytorchmergebot commentedOct 31, 2025
@BoyuanFeng your PR has been successfully reverted. |
This reverts commita6b1ef1.Reverted#166338 on behalf ofhttps://github.com/atalman due to Failure: test/nn/test_convolution.py::TestConvolutionNN::test_conv3d_overflow_values [GH job link](https://github.com/pytorch/pytorch/actions/runs/18961173726/job/54149112920) [HUD commit link](https://hud.pytorch.org/pytorch/pytorch/commit/a6b1ef17173f56ba93ac97ff4384fa4060b5e41e) ([comment](#166338 (comment)))
BoyuanFeng commentedOct 31, 2025
@atalman the failure is not related to this pr. I also cannot repro locally. Let me rebase and try ci again. ![]() |
BoyuanFeng commentedOct 31, 2025
@pytorchbot merge |
pytorchmergebot commentedOct 31, 2025
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in thewiki. Questions? Feedback? Please reach out to thePyTorch DevX Team |
Graph partition relies on `get_free_symbol_uses()` to collect symbol inputs.https://github.com/pytorch/pytorch/blob/ee7434be822cf6e75b4566d8159f550ee233d8ae/torch/_inductor/scheduler.py#L4869-L4885I empirically observed that `get_free_symbol_uses()` becomes slower for larger graphs. Specifically, I tried to aten fallback for torchtitan which results in 10k+ aten nodes. When processing the 600-th node, it takes seconds to `get_free_symbol_uses()` for 1 node.Why? Because `get_free_symbol_uses()` may recursively call another `get_free_symbol_uses()`, which could recursively run many times.https://github.com/pytorch/pytorch/blob/ee7434be822cf6e75b4566d8159f550ee233d8ae/torch/_inductor/ir.py#L4541-L4543This PR fixes the issue by caching the results of `get_free_symbol_uses()`. I validated on torchtitan that the issue is fixed.Pull Requestresolved:#166338Approved by:https://github.com/eellison
This reverts commita6b1ef1.Reverted#166338 on behalf ofhttps://github.com/atalman due to Failure: test/nn/test_convolution.py::TestConvolutionNN::test_conv3d_overflow_values [GH job link](https://github.com/pytorch/pytorch/actions/runs/18961173726/job/54149112920) [HUD commit link](https://hud.pytorch.org/pytorch/pytorch/commit/a6b1ef17173f56ba93ac97ff4384fa4060b5e41e) ([comment](#166338 (comment)))
Graph partition relies on `get_free_symbol_uses()` to collect symbol inputs.https://github.com/pytorch/pytorch/blob/ee7434be822cf6e75b4566d8159f550ee233d8ae/torch/_inductor/scheduler.py#L4869-L4885I empirically observed that `get_free_symbol_uses()` becomes slower for larger graphs. Specifically, I tried to aten fallback for torchtitan which results in 10k+ aten nodes. When processing the 600-th node, it takes seconds to `get_free_symbol_uses()` for 1 node.Why? Because `get_free_symbol_uses()` may recursively call another `get_free_symbol_uses()`, which could recursively run many times.https://github.com/pytorch/pytorch/blob/ee7434be822cf6e75b4566d8159f550ee233d8ae/torch/_inductor/ir.py#L4541-L4543This PR fixes the issue by caching the results of `get_free_symbol_uses()`. I validated on torchtitan that the issue is fixed.Pull Requestresolved:#166338Approved by:https://github.com/eellison
Graph partition relies on `get_free_symbol_uses()` to collect symbol inputs.https://github.com/pytorch/pytorch/blob/ee7434be822cf6e75b4566d8159f550ee233d8ae/torch/_inductor/scheduler.py#L4869-L4885I empirically observed that `get_free_symbol_uses()` becomes slower for larger graphs. Specifically, I tried to aten fallback for torchtitan which results in 10k+ aten nodes. When processing the 600-th node, it takes seconds to `get_free_symbol_uses()` for 1 node.Why? Because `get_free_symbol_uses()` may recursively call another `get_free_symbol_uses()`, which could recursively run many times.https://github.com/pytorch/pytorch/blob/ee7434be822cf6e75b4566d8159f550ee233d8ae/torch/_inductor/ir.py#L4541-L4543This PR fixes the issue by caching the results of `get_free_symbol_uses()`. I validated on torchtitan that the issue is fixed.Pull Requestresolved:pytorch#166338Approved by:https://github.com/eellison
This reverts commita6b1ef1.Revertedpytorch#166338 on behalf ofhttps://github.com/atalman due to Failure: test/nn/test_convolution.py::TestConvolutionNN::test_conv3d_overflow_values [GH job link](https://github.com/pytorch/pytorch/actions/runs/18961173726/job/54149112920) [HUD commit link](https://hud.pytorch.org/pytorch/pytorch/commit/a6b1ef17173f56ba93ac97ff4384fa4060b5e41e) ([comment](pytorch#166338 (comment)))
Graph partition relies on `get_free_symbol_uses()` to collect symbol inputs.https://github.com/pytorch/pytorch/blob/ee7434be822cf6e75b4566d8159f550ee233d8ae/torch/_inductor/scheduler.py#L4869-L4885I empirically observed that `get_free_symbol_uses()` becomes slower for larger graphs. Specifically, I tried to aten fallback for torchtitan which results in 10k+ aten nodes. When processing the 600-th node, it takes seconds to `get_free_symbol_uses()` for 1 node.Why? Because `get_free_symbol_uses()` may recursively call another `get_free_symbol_uses()`, which could recursively run many times.https://github.com/pytorch/pytorch/blob/ee7434be822cf6e75b4566d8159f550ee233d8ae/torch/_inductor/ir.py#L4541-L4543This PR fixes the issue by caching the results of `get_free_symbol_uses()`. I validated on torchtitan that the issue is fixed.Pull Requestresolved:pytorch#166338Approved by:https://github.com/eellison
BoyuanFeng commentedNov 4, 2025
@pytorchbot cherry-pick --onto release/2.9 --fixes "Inductor partition compilation infinite hang issue introduced in 2.9.0 breaking torchtitan" -c fixnewfeature |
pytorchbot commentedNov 4, 2025
Cherry picking#166338Command Details for Dev Infra teamRaised byworkflow job |
Graph partition relies on `get_free_symbol_uses()` to collect symbol inputs.https://github.com/pytorch/pytorch/blob/ee7434be822cf6e75b4566d8159f550ee233d8ae/torch/_inductor/scheduler.py#L4869-L4885I empirically observed that `get_free_symbol_uses()` becomes slower for larger graphs. Specifically, I tried to aten fallback for torchtitan which results in 10k+ aten nodes. When processing the 600-th node, it takes seconds to `get_free_symbol_uses()` for 1 node.Why? Because `get_free_symbol_uses()` may recursively call another `get_free_symbol_uses()`, which could recursively run many times.https://github.com/pytorch/pytorch/blob/ee7434be822cf6e75b4566d8159f550ee233d8ae/torch/_inductor/ir.py#L4541-L4543This PR fixes the issue by caching the results of `get_free_symbol_uses()`. I validated on torchtitan that the issue is fixed.Pull Requestresolved:#166338Approved by:https://github.com/eellison(cherry picked from commitdfebdca)
Graph partition relies on `get_free_symbol_uses()` to collect symbol inputs.https://github.com/pytorch/pytorch/blob/ee7434be822cf6e75b4566d8159f550ee233d8ae/torch/_inductor/scheduler.py#L4869-L4885I empirically observed that `get_free_symbol_uses()` becomes slower for larger graphs. Specifically, I tried to aten fallback for torchtitan which results in 10k+ aten nodes. When processing the 600-th node, it takes seconds to `get_free_symbol_uses()` for 1 node.Why? Because `get_free_symbol_uses()` may recursively call another `get_free_symbol_uses()`, which could recursively run many times.https://github.com/pytorch/pytorch/blob/ee7434be822cf6e75b4566d8159f550ee233d8ae/torch/_inductor/ir.py#L4541-L4543This PR fixes the issue by caching the results of `get_free_symbol_uses()`. I validated on torchtitan that the issue is fixed.Pull Requestresolved:#166338Approved by:https://github.com/eellison(cherry picked from commitdfebdca)
Graph partition relies on `get_free_symbol_uses()` to collect symbol inputs.https://github.com/pytorch/pytorch/blob/ee7434be822cf6e75b4566d8159f550ee233d8ae/torch/_inductor/scheduler.py#L4869-L4885I empirically observed that `get_free_symbol_uses()` becomes slower for larger graphs. Specifically, I tried to aten fallback for torchtitan which results in 10k+ aten nodes. When processing the 600-th node, it takes seconds to `get_free_symbol_uses()` for 1 node.Why? Because `get_free_symbol_uses()` may recursively call another `get_free_symbol_uses()`, which could recursively run many times.https://github.com/pytorch/pytorch/blob/ee7434be822cf6e75b4566d8159f550ee233d8ae/torch/_inductor/ir.py#L4541-L4543This PR fixes the issue by caching the results of `get_free_symbol_uses()`. I validated on torchtitan that the issue is fixed.Pull Requestresolved:#166338Approved by:https://github.com/eellison(cherry picked from commitdfebdca)
Graph partition relies on `get_free_symbol_uses()` to collect symbol inputs.https://github.com/pytorch/pytorch/blob/ee7434be822cf6e75b4566d8159f550ee233d8ae/torch/_inductor/scheduler.py#L4869-L4885I empirically observed that `get_free_symbol_uses()` becomes slower for larger graphs. Specifically, I tried to aten fallback for torchtitan which results in 10k+ aten nodes. When processing the 600-th node, it takes seconds to `get_free_symbol_uses()` for 1 node.Why? Because `get_free_symbol_uses()` may recursively call another `get_free_symbol_uses()`, which could recursively run many times.https://github.com/pytorch/pytorch/blob/ee7434be822cf6e75b4566d8159f550ee233d8ae/torch/_inductor/ir.py#L4541-L4543This PR fixes the issue by caching the results of `get_free_symbol_uses()`. I validated on torchtitan that the issue is fixed.Pull Requestresolved:#166338Approved by:https://github.com/eellison(cherry picked from commitdfebdca)Co-authored-by: Boyuan Feng <boyuan@meta.com>

Uh oh!
There was an error while loading.Please reload this page.
Graph partition relies on
get_free_symbol_uses()to collect symbol inputs.pytorch/torch/_inductor/scheduler.py
Lines 4869 to 4885 inee7434b
I empirically observed that
get_free_symbol_uses()becomes slower for larger graphs. Specifically, I tried to aten fallback for torchtitan which results in 10k+ aten nodes. When processing the 600-th node, it takes seconds toget_free_symbol_uses()for 1 node.Why? Because
get_free_symbol_uses()may recursively call anotherget_free_symbol_uses(), which could recursively run many times.pytorch/torch/_inductor/ir.py
Lines 4541 to 4543 inee7434b
This PR fixes the issue by caching the results of
get_free_symbol_uses(). I validated on torchtitan that the issue is fixed.cc@voznesenskym@penguinwu@EikanWang@jgong5@Guobing-Chen@XiaobingSuper@zhuhaozhe@blzheng@wenzhe-nrv@jiayisunx@ipiszy@chenyang78@kadeng@muchulee8@amjames@chauhang@aakhundov@coconutruben