- Notifications
You must be signed in to change notification settings - Fork3.6k
How does DDP work under the hood in Lightning?#20917
-
I'm trying to make a pipeline with hydra+lightning, but it seems lightning spawns multiple processes and runs the script multiple times (like using torchrun), causing unwanted side effects. So i want to understand how it works under the hood to fix the problem. (Not likely caused by lightning/hydra) I know how it works with plain pytorch, so tried to dig into the source code.. which ended up with more confusion Does anyone know how lightning (or fabric) handles DDP, especially without torchrun? |
BetaWas this translation helpful?Give feedback.
All reactions
Replies: 2 comments
-
Same issue here |
BetaWas this translation helpful?Give feedback.
All reactions
-
Could you please share a minimal reproducible script that demonstrates the unwanted side effects? That will help us narrow down the issue and suggest a more accurate solution. Also, since Lightning already invokes torchrun internally, you may want to try running it as a regular Python script instead of using the torchrun command directly. |
BetaWas this translation helpful?Give feedback.