NotificationsYou must be signed in to change notification settings
Fork26.3k
Star96k

[DeviceMesh] Fix error in fake-mode + TORCH_DISTRIBUTED_DEBUG#170765

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Jump to bottom

Open

lw wants to merge4 commits intomain

base:main

Choose a base branch

fromlw_mesh_fake_mode

Open

[DeviceMesh] Fix error in fake-mode + TORCH_DISTRIBUTED_DEBUG#170765

lw wants to merge4 commits intomainfromlw_mesh_fake_mode

+12 −11

Conversation

Copy link

Contributor

pytorch-botbot commentedDec 18, 2025•
edited
Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results athud.pytorch.org/pr/170765

📄 PreviewPython docs built from this PR
📄 PreviewC++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit thebot commands wiki

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (1 Unrelated Failure)

As of commit7b776e3 with merge basedbba85b ():

BROKEN TRUNK - The following job failed but were present on the merge base:

👉Rebase onto the `viable/strict` branch to avoid these failures

pull / linux-jammy-py3.10-clang12-onnx / test (default, 2, 2, lf.linux.c7i.2xlarge) (gh) (trunk failure)
test/onnx/ops/test_ops.py::NativeOnnxOpsTest::test_attention_export_gqa

This comment was automatically generated by Dr. CI and updates every 15 minutes.

lw added3 commits

December 18, 2025 15:04

Update device_mesh.py

e724909

Update _mesh_layout.py

f73e62b

Update device_mesh.py

7b776e3

wconstab reviewed

Dec 18, 2025

View reviewed changes

torch/distributed/device_mesh.py

		"either have all its original dimensions (e.g., no slicing) "
		"or it needs to contain the local rank"
		)
		withtorch._subclasses.fake_tensor.unset_fake_temporarily():

Copy link

Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

so what is the behavior of tracing a call to .mesh in dynamo?

Copy link

ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

The way I see it, it should not be part of the public interface that DeviceMesh is internally using Tensors and, in fact, it's doing so less and less (thanks to CuTe layouts) and themesh property is mainly a legacy feature.

Moreover, I expect that all the operations on DeviceMeshes end up being completely "desugared" in the Dynamo graph, after they have helped introduce the correct collectives.

Finally, DeviceMeshe's internal Tensors arealways on CPU, and their values are never data-dependent or anything, thus I don't see any point in supporting fake tensors...