Multiple GPU Windows System#19866

Unanswered

ptrem asked this question inDDP / multi-GPU / multi-node

ptrem

May 13, 2024

· 1 comments· 2 replies

Return to top

Discussion options

ptrem
May 13, 2024

Hi,
I have a Workstation with two RTX A6000 GPUs and a Windows System 🙈
and I would like to use both GPUs with Lightning-AI.
It's possible to use just use one of the GPUs but i get the following error if I try to use both GPUs.

The training works fine for:

trainer = pl.Trainer(        accelerator="gpu",        devices=1,        max_epochs=epochs,        logger=wandb_logger)

But I can't use for exampledevces=2,devices=[0, 1] ordevices=-1.
In every case I get the following error:
RuntimeError: Distributed package doesn't have NCCL built in

Has anyone an idea how to run Lightning on Multiple GPUs on a Windows System?
Thanks

You must be logged in to vote

Replies: 1 comment 2 replies

Comment options

davidgill97
Jul 12, 2024

It's bit late, but i believe nccl backend for ddp training is not supported for windows.
This is how you should configure your trainer on a windows system, but i experienced a significant slowdown before, so do compare the speed with nccl on a linux machine.

from lightning.pytorch.strategies import DDPStrategyddp = DDPStrategy(process_group_backend="gloo")trainer = Trainer(strategy=ddp, accelerator="gpu", devices=2)

You must be logged in to vote

2 replies

Comment options

ptrem Jul 12, 2024
Author

Thanks for your reply. I will try it out.

Comment options

davidgill97 Dec 13, 2024

It's bit late, but i believe nccl backend for ddp training is not supported for windows. This is how you should configure your trainer on a windows system, but i experienced a significant slowdown before, so do compare the speed with nccl on a linux machine.
from lightning.pytorch.strategies import DDPStrategyddp = DDPStrategy(process_group_backend="gloo")trainer = Trainer(strategy=ddp, accelerator="gpu", devices=2)
I tried it but Trainer doesn't have parameter where you can set strategy==ddp.

I'm not sure if i understood that correctly. According to thedocumentation,Trainer indeed has a parameterstrategy.

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Multiple GPU Windows System#19866

Uh oh!

{{title}}

Uh oh!

ptrem
May 13, 2024

Replies: 1 comment 2 replies

Uh oh!

{{title}}