NotificationsYou must be signed in to change notification settings
Fork26.3k
Star96k

[c10d][fr] Split cuda and non-cuda fr logic into two cpp file#154929

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Jump to bottom

Closed

fduwjj wants to merge4 commits intogh/fduwjj/143/basefromgh/fduwjj/143/head

Closed

[c10d][fr] Split cuda and non-cuda fr logic into two cpp file#154929

fduwjj wants to merge4 commits intogh/fduwjj/143/basefromgh/fduwjj/143/head

Conversation

Copy link

Contributor

fduwjj commentedJun 2, 2025•
edited
Loading

Stack fromghstack (oldest at bottom):

During the integration fr with gloo I found that put all logic inside one cpp with both build Macro does not work in the current linkage set up in the bazil file. If we put the cpp in the libtorch_cpu, then cuda side build will fail, if we put both we get complaint about ld.lld: error: duplicate symbol: typeinfo for c10d::DebugInfoWriter. To fix this, we need to move the common logic into another header file and we use different cpp file for cpu and cuda so that fr can be used in both cases.

cc@H-Huang @awgu @wanchaol @fegin @wz337 @wconstab @d4l3k

Differential Revision:D75877197

[c10][fr] Split cuda and non-cuda fr logic into two cpp file

b818342

[ghstack-poisoned]

Copy link

pytorch-botbot commentedJun 2, 2025•
edited
Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results athud.pytorch.org/pr/154929

📄 PreviewPython docs built from this PR
📄 PreviewC++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit thebot commands wiki or ouroffice hours

Note: Links to docs will display an error until the docs builds have been completed.

⏳ No Failures, 1 Pending

As of commit6674381 with merge base0d0058d ():
💚 Looks good so far! There are no failures yet. 💚

UNSTABLE - The following job is marked as unstable, possibly due to flakiness on trunk:

⏳pull / linux-jammy-py3-clang12-executorch / build (gh) (#150261)

This comment was automatically generated by Dr. CI and updates every 15 minutes.

pytorch-botbot added oncall: distributed

Add this issue/PR to distributed oncall triage queue

release notes: distributed (c10d)release notes category labels

Jun 2, 2025

Update on "[c10][fr] Split cuda and non-cuda fr logic into two cpp file"

6832e5c

cc H-Huang awgu wanchaol fegin wz337 wconstab d4l3k[ghstack-poisoned]

Update on "[c10][fr] Split cuda and non-cuda fr logic into two cpp file"

9899369

cc H-Huang awgu wanchaol fegin wz337 wconstab d4l3k[ghstack-poisoned]

fduwjj requested review fromd4l3k andkwen2501

June 2, 2025 23:20

fduwjj changed the title~~[c10][fr] Split cuda and non-cuda fr logic into two cpp file~~[c10d][fr] Split cuda and non-cuda fr logic into two cpp file

Jun 2, 2025

fduwjj mentioned this pull request

Jun 2, 2025

[c10d][gloo] Integrate vendor generic FR into gloo#152614

Closed

kwen2501 approved these changes

Jun 3, 2025

View reviewed changes

Copy link

Collaborator

kwen2501 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

LGTM

fduwjj added the ciflow/trunkTrigger trunk jobs on your pull request label

Jun 3, 2025

Update on "[c10d][fr] Split cuda and non-cuda fr logic into two cpp f…

6674381

…ile"During the integration fr with gloo I found that put all logic inside one cpp with both build Macro does not work in the current linkage set up in the bazil file. If we put the cpp in the libtorch_cpu, then cuda side build will fail, if we put both we get complaint about  ld.lld: error: duplicate symbol: typeinfo for c10d::DebugInfoWriter. To fix this, we need to move the common logic into another header file and we use different cpp file for cpu and cuda so that fr can be used in both cases.cc H-Huang awgu wanchaol fegin wz337 wconstab d4l3k[ghstack-poisoned]

fduwjj added a commit that referenced this pull request

Jun 3, 2025

[c10][fr] Split cuda and non-cuda fr logic into two cpp file

24d1a48

ghstack-source-id:6484815Pull Requestresolved:#154929

Copy link

ContributorAuthor

fduwjj commentedJun 3, 2025

@pytorchbot merge

pytorchmergebot added the merging label

Jun 3, 2025

Copy link

Collaborator

pytorchmergebot commentedJun 3, 2025

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in thewiki.

Questions? Feedback? Please reach out to thePyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot added the Merged label

Jun 3, 2025

pytorchmergebot closed this ind91c85b

Jun 3, 2025

pytorchmergebot removed the merging label

Jun 3, 2025

Copy link

ContributorAuthor

fduwjj commentedJun 3, 2025

@fduwjj has imported this pull request. If you are a Meta employee, you can view this diffon Phabricator.

pytorchmergebot pushed a commit that referenced this pull request

Jun 3, 2025

[c10d][gloo] Integrate vendor generic FR into gloo (#152614)

ff92b42

This is a first quick prototyping for FR integration for gloo. Few features gaps:- Input/Output numels for each collective- Whether to use c10::Event or where to use it.- Where to dump the FR traces. (The dump api is provided in this PR)Differential Revision: [D75803601](https://our.internmc.facebook.com/intern/diff/D75803601)Pull Requestresolved:#152614Approved by:https://github.com/d4l3kghstack dependencies:#154929

iupaikov-amd pushed a commit to ROCm/pytorch that referenced this pull request

Jun 4, 2025

[c10d][fr] Split cuda and non-cuda fr logic into two cpp file (pytorc…

740731a

…h#154929)During the integration fr with gloo I found that put all logic inside one cpp with both build Macro does not work in the current linkage set up in the bazil file. If we put the cpp in the libtorch_cpu, then cuda side build will fail, if we put both we get complaint about  ld.lld: error: duplicate symbol: typeinfo for c10d::DebugInfoWriter. To fix this, we need to move the common logic into another header file and we use different cpp file for cpu and cuda so that fr can be used in both cases.Pull Requestresolved:pytorch#154929Approved by:https://github.com/kwen2501

iupaikov-amd pushed a commit to ROCm/pytorch that referenced this pull request

Jun 4, 2025

[c10d][gloo] Integrate vendor generic FR into gloo (pytorch#152614)

7e63c24

This is a first quick prototyping for FR integration for gloo. Few features gaps:- Input/Output numels for each collective- Whether to use c10::Event or where to use it.- Where to dump the FR traces. (The dump api is provided in this PR)Differential Revision: [D75803601](https://our.internmc.facebook.com/intern/diff/D75803601)Pull Requestresolved:pytorch#152614Approved by:https://github.com/d4l3kghstack dependencies:pytorch#154929

angelayi pushed a commit to angelayi/pytorch that referenced this pull request

Jun 5, 2025

[c10d][gloo] Integrate vendor generic FR into gloo (pytorch#152614)

7b668c4

This is a first quick prototyping for FR integration for gloo. Few features gaps:- Input/Output numels for each collective- Whether to use c10::Event or where to use it.- Where to dump the FR traces. (The dump api is provided in this PR)Differential Revision: [D75803601](https://our.internmc.facebook.com/intern/diff/D75803601)Pull Requestresolved:pytorch#152614Approved by:https://github.com/d4l3kghstack dependencies:pytorch#154929

github-actionsbot deleted the gh/fduwjj/143/head branch

July 4, 2025 02:21

Labels

ciflow/trunk

Trigger trunk jobs on your pull request

Merged oncall: distributed

Add this issue/PR to distributed oncall triage queue

release notes: distributed (c10d)

release notes category

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[c10d][fr] Split cuda and non-cuda fr logic into two cpp file#154929

[c10d][fr] Split cuda and non-cuda fr logic into two cpp file#154929

Uh oh!

Conversation

fduwjj commentedJun 2, 2025•
edited
Loading

Uh oh!

Uh oh!

pytorch-botbot commentedJun 2, 2025•
edited
Loading

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results athud.pytorch.org/pr/154929

⏳ No Failures, 1 Pending

Uh oh!

kwen2501 left a comment

Choose a reason for hiding this comment

Uh oh!

fduwjj commentedJun 3, 2025

Uh oh!

pytorchmergebot commentedJun 3, 2025

Merge started

Uh oh!

fduwjj commentedJun 3, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Movatterモバイル変換

[c10d][fr] Split cuda and non-cuda fr logic into two cpp file#154929

[c10d][fr] Split cuda and non-cuda fr logic into two cpp file#154929

Uh oh!

Conversation

fduwjj commentedJun 2, 2025• editedLoading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

Uh oh!

pytorch-botbot commentedJun 2, 2025• editedLoading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results athud.pytorch.org/pr/154929

⏳ No Failures, 1 Pending

Uh oh!

kwen2501 left a comment

Choose a reason for hiding this comment

Uh oh!

fduwjj commentedJun 3, 2025

Uh oh!

pytorchmergebot commentedJun 3, 2025

Merge started

Uh oh!

fduwjj commentedJun 3, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

fduwjj commentedJun 2, 2025•
edited
Loading

pytorch-botbot commentedJun 2, 2025•
edited
Loading