- Notifications
You must be signed in to change notification settings - Fork3.6k
Emulating multiple devices with a single GPU#8630
-
Hello, I have a single GPU, but I would like to spawn multiple replicas on that single GPU and train a model with DDP. Of course, each replica would have to use a smaller batch size in order to fit in memory. (For my use case, I am not interested in having a single replica with a large batch size). I tried to pass
But in the end it crashed with Please, is there any way to split a single GPU into multiple replicas with Lightning? P.S.: Ray has a really nice support for fractional GPUs:https://docs.ray.io/en/master/using-ray-with-gpus.html#fractional-gpus. I've never used them with Lightning, but maybe it could be a workaround? |
BetaWas this translation helpful?Give feedback.
All reactions
For reference: it seems to be possible when the backend isgloo
instead ofnccl
. See discussion here:#8630 (reply in thread).
Replies: 2 comments 11 replies
-
Hmm interesting use case. AFAIU it is not possible, at least with Fromhttps://docs.nvidia.com/deeplearning/nccl/user-guide/docs/usage/communicators.html:
It probably can be done if you write a custom gradient sync'ing logic, which moves gradient to RAM before sync'ing and sync them with a |
BetaWas this translation helpful?Give feedback.
All reactions
-
@ananthsub yes potentially in our parsing or alternatively also in the plugin which already gets the list of devices. |
BetaWas this translation helpful?Give feedback.
All reactions
-
@justusschock sorry I'm not very familiar with how MPI works with GPUs in |
BetaWas this translation helpful?Give feedback.
All reactions
-
@yifuwang I tried a bit more, and it actually worked with |
BetaWas this translation helpful?Give feedback.
All reactions
-
@tholop did you experience the same kind of speedup you'd expect if you were training on multiple separate physical devices? PS: I haven't experimented yet, but I suspect you might be able to apply the fractional GPU capability in ray to achieve something like this. |
BetaWas this translation helpful?Give feedback.
All reactions
-
@dmarx I did experience a speedup, but not as good as having separate physical devices. I didn't benchmark thoroughly though. I totally agree regarding Ray's fractional GPUs! I mentioned it in the original issue as a possible workaround, but it might require a bit more work than just passing a string to Lightning. |
BetaWas this translation helpful?Give feedback.
All reactions
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
-
For reference: it seems to be possible when the backend is |
BetaWas this translation helpful?Give feedback.
All reactions
👍 2
-
pytorch lightning complains about using the same device ID in current version, any workaround?@tholop certainly interested in this to get more steps vs batches |
BetaWas this translation helpful?Give feedback.
All reactions
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
-
hi@ksasso1028, i'm running into the same thing with pytorch lightning right now. did you happen to find a workaround? update: I just commented out the check unique ids call in device_parser.py ... so far it's working ok |
BetaWas this translation helpful?Give feedback.
All reactions
-
Does it work with multi-node setup too or is it a single-node multi-gpu only? |
BetaWas this translation helpful?Give feedback.