- Notifications
You must be signed in to change notification settings - Fork26.3k
[FSDP2] respect reshard_after_forward=True for root model#154704
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
Uh oh!
There was an error while loading.Please reload this page.
Conversation
[ghstack-poisoned]
pytorch-botbot commentedMay 30, 2025 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
🔗 Helpful Links🧪 See artifacts and rendered test results athud.pytorch.org/pr/154704
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit2bbef5b with merge base0c6c778 ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
weifengpy commentedMay 30, 2025
@weifengpy has imported this pull request. If you are a Meta employee, you can view this diffon Phabricator. |
…model"resolve#154655cc H-Huang awgu wanchaol fegin fduwjj wz337 wconstab d4l3kDifferential Revision: [D75663200](https://our.internmc.facebook.com/intern/diff/D75663200)[ghstack-poisoned]
resolve#154655`fully_shard(root, reshard_after_forward=True)` didn't really reshard parameters after forward, because we assumed root model will be used in backward immeidately. The assumption becomes invalid in 2 cases* we have 3 roots for CLIP, T5, FLUX. we should reshard parameters are CLIP and T5 immeidately after their forwardfor recommendation model, we may have mutiple root for dense partChange default beahvior to always respect `reshard_after_forward=True`cc H-Huang awgu wanchaol fegin fduwjj wz337 wconstab d4l3kDifferential Revision: [D75663200](https://our.internmc.facebook.com/intern/diff/D75663200)[ghstack-poisoned]
weifengpy commentedJun 2, 2025
@weifengpy has imported this pull request. If you are a Meta employee, you can view this diffon Phabricator. |
resolve#154655`fully_shard(root, reshard_after_forward=True)` didn't really reshard parameters after forward, because we assumed root model will be used in backward immeidately. The assumption becomes invalid in 2 cases* we have 3 roots for CLIP, T5, FLUX. we should reshard parameters are CLIP and T5 immeidately after their forwardfor recommendation model, we may have mutiple root for dense partChange default beahvior to always respect `reshard_after_forward=True`cc H-Huang awgu wanchaol fegin fduwjj wz337 wconstab d4l3kDifferential Revision: [D75663200](https://our.internmc.facebook.com/intern/diff/D75663200)[ghstack-poisoned]
weifengpy commentedJun 2, 2025
@weifengpy has imported this pull request. If you are a Meta employee, you can view this diffon Phabricator. |
resolve#154655`fully_shard(root, reshard_after_forward=True)` didn't really reshard parameters after forward, because we assumed root model will be used in backward immeidately. The assumption becomes invalid in 2 cases* we have 3 roots for CLIP, T5, FLUX. we should reshard parameters are CLIP and T5 immeidately after their forwardfor recommendation model, we may have mutiple root for dense partChange default beahvior to always respect `reshard_after_forward=True`cc H-Huang awgu wanchaol fegin fduwjj wz337 wconstab d4l3kDifferential Revision: [D75663200](https://our.internmc.facebook.com/intern/diff/D75663200)[ghstack-poisoned]
resolve#154655`fully_shard(root, reshard_after_forward=True)` didn't really reshard parameters after forward, because we assumed root model will be used in backward immeidately. The assumption becomes invalid in 2 cases* we have 3 roots for CLIP, T5, FLUX. we should reshard parameters are CLIP and T5 immeidately after their forwardfor recommendation model, we may have mutiple root for dense partChange default beahvior to always respect `reshard_after_forward=True`cc H-Huang awgu wanchaol fegin fduwjj wz337 wconstab d4l3kDifferential Revision: [D75663200](https://our.internmc.facebook.com/intern/diff/D75663200)[ghstack-poisoned]
| setattr(module,param_name,param) | ||
| fully_shard(model) | ||
| # need to fix reshard_after_forward=True | ||
| # https://github.com/pytorch/pytorch/issues/154836 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
this PR unreveals an existing bug in extension. should be following up separately and change it back toreshard_after_forward=True
weifengpy commentedJun 2, 2025
@weifengpy has imported this pull request. If you are a Meta employee, you can view this diffon Phabricator. |
weifengpy commentedJun 2, 2025
@weifengpy has imported this pull request. If you are a Meta employee, you can view this diffon Phabricator. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Shall we remove the comments at line118model(inp) # root does not reshard after forward under test_param_registration_after_forward
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
good point. I will update the PR. anything else?
mori360 left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Thanks for the PR! LGTM
weifengpy commentedJun 3, 2025
@pytorchmergebot merge |
pytorchmergebot commentedJun 3, 2025
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in thewiki. Questions? Feedback? Please reach out to thePyTorch DevX Team |
…4704)resolvepytorch#154655`fully_shard(root, reshard_after_forward=True)` didn't really reshard parameters after forward, because we assumed root model will be used in backward immeidately. The assumption becomes invalid in 2 cases* we have 3 roots for CLIP, T5, FLUX. we should reshard parameters are CLIP and T5 immeidately after their forwardfor recommendation model, we may have mutiple root for dense partChange default beahvior to always respect `reshard_after_forward=True`Differential Revision: [D75663200](https://our.internmc.facebook.com/intern/diff/D75663200)Pull Requestresolved:pytorch#154704Approved by:https://github.com/mori360
Uh oh!
There was an error while loading.Please reload this page.
resolve#154655
Stack fromghstack (oldest at bottom):
fully_shard(root, reshard_after_forward=True)didn't really reshard parameters after forward, because we assumed root model will be used in backward immeidately. The assumption becomes invalid in 2 casesfor recommendation model, we may have mutiple root for dense part
Change default beahvior to always respect
reshard_after_forward=Truecc@H-Huang@awgu@wanchaol@fegin@fduwjj@wz337@wconstab@d4l3k
Differential Revision:D75663200