Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

[c10d][fr] Split cuda and non-cuda fr logic into two cpp file#154929

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Closed
fduwjj wants to merge4 commits intogh/fduwjj/143/basefromgh/fduwjj/143/head

Conversation

@fduwjj
Copy link
Contributor

@fduwjjfduwjj commentedJun 2, 2025
edited
Loading

Stack fromghstack (oldest at bottom):

During the integration fr with gloo I found that put all logic inside one cpp with both build Macro does not work in the current linkage set up in the bazil file. If we put the cpp in the libtorch_cpu, then cuda side build will fail, if we put both we get complaint about ld.lld: error: duplicate symbol: typeinfo for c10d::DebugInfoWriter. To fix this, we need to move the common logic into another header file and we use different cpp file for cpu and cuda so that fr can be used in both cases.

cc@H-Huang@awgu@wanchaol@fegin@wz337@wconstab@d4l3k

Differential Revision:D75877197

@pytorch-bot
Copy link

pytorch-botbot commentedJun 2, 2025
edited
Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results athud.pytorch.org/pr/154929

Note: Links to docs will display an error until the docs builds have been completed.

⏳ No Failures, 1 Pending

As of commit6674381 with merge base0d0058d (image):
💚 Looks good so far! There are no failures yet. 💚

UNSTABLE - The following job is marked as unstable, possibly due to flakiness on trunk:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@pytorch-botpytorch-botbot added oncall: distributedAdd this issue/PR to distributed oncall triage queue release notes: distributed (c10d)release notes category labelsJun 2, 2025
cc H-Huang awgu wanchaol fegin wz337 wconstab d4l3k[ghstack-poisoned]
cc H-Huang awgu wanchaol fegin wz337 wconstab d4l3k[ghstack-poisoned]
@fduwjjfduwjj requested review fromd4l3k andkwen2501June 2, 2025 23:20
@fduwjjfduwjj changed the title[c10][fr] Split cuda and non-cuda fr logic into two cpp file[c10d][fr] Split cuda and non-cuda fr logic into two cpp fileJun 2, 2025
Copy link
Collaborator

@kwen2501kwen2501 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

LGTM

@fduwjjfduwjj added the ciflow/trunkTrigger trunk jobs on your pull request labelJun 3, 2025
…ile"During the integration fr with gloo I found that put all logic inside one cpp with both build Macro does not work in the current linkage set up in the bazil file. If we put the cpp in the libtorch_cpu, then cuda side build will fail, if we put both we get complaint about  ld.lld: error: duplicate symbol: typeinfo for c10d::DebugInfoWriter. To fix this, we need to move the common logic into another header file and we use different cpp file for cpu and cuda so that fr can be used in both cases.cc H-Huang awgu wanchaol fegin wz337 wconstab d4l3k[ghstack-poisoned]
fduwjj added a commit that referenced this pull requestJun 3, 2025
@fduwjj
Copy link
ContributorAuthor

@pytorchbot merge

pytorch-bot[bot] reacted with thumbs up emoji

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in thewiki.

Questions? Feedback? Please reach out to thePyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@fduwjj
Copy link
ContributorAuthor

@fduwjj has imported this pull request. If you are a Meta employee, you can view this diffon Phabricator.

pytorchmergebot pushed a commit that referenced this pull requestJun 3, 2025
This is a first quick prototyping for FR integration for gloo. Few features gaps:- Input/Output numels for each collective- Whether to use c10::Event or where to use it.- Where to dump the FR traces. (The dump api is provided in this PR)Differential Revision: [D75803601](https://our.internmc.facebook.com/intern/diff/D75803601)Pull Requestresolved:#152614Approved by:https://github.com/d4l3kghstack dependencies:#154929
iupaikov-amd pushed a commit to ROCm/pytorch that referenced this pull requestJun 4, 2025
…h#154929)During the integration fr with gloo I found that put all logic inside one cpp with both build Macro does not work in the current linkage set up in the bazil file. If we put the cpp in the libtorch_cpu, then cuda side build will fail, if we put both we get complaint about  ld.lld: error: duplicate symbol: typeinfo for c10d::DebugInfoWriter. To fix this, we need to move the common logic into another header file and we use different cpp file for cpu and cuda so that fr can be used in both cases.Pull Requestresolved:pytorch#154929Approved by:https://github.com/kwen2501
iupaikov-amd pushed a commit to ROCm/pytorch that referenced this pull requestJun 4, 2025
This is a first quick prototyping for FR integration for gloo. Few features gaps:- Input/Output numels for each collective- Whether to use c10::Event or where to use it.- Where to dump the FR traces. (The dump api is provided in this PR)Differential Revision: [D75803601](https://our.internmc.facebook.com/intern/diff/D75803601)Pull Requestresolved:pytorch#152614Approved by:https://github.com/d4l3kghstack dependencies:pytorch#154929
angelayi pushed a commit to angelayi/pytorch that referenced this pull requestJun 5, 2025
This is a first quick prototyping for FR integration for gloo. Few features gaps:- Input/Output numels for each collective- Whether to use c10::Event or where to use it.- Where to dump the FR traces. (The dump api is provided in this PR)Differential Revision: [D75803601](https://our.internmc.facebook.com/intern/diff/D75803601)Pull Requestresolved:pytorch#152614Approved by:https://github.com/d4l3kghstack dependencies:pytorch#154929
@github-actionsgithub-actionsbot deleted the gh/fduwjj/143/head branchJuly 4, 2025 02:21
Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment

Reviewers

@kwen2501kwen2501kwen2501 approved these changes

@d4l3kd4l3kAwaiting requested review from d4l3k

Assignees

No one assigned

Labels

ciflow/trunkTrigger trunk jobs on your pull requestMergedoncall: distributedAdd this issue/PR to distributed oncall triage queuerelease notes: distributed (c10d)release notes category

Projects

None yet

Milestone

No milestone

Development

Successfully merging this pull request may close these issues.

4 participants

@fduwjj@pytorchmergebot@kwen2501

[8]ページ先頭

©2009-2025 Movatter.jp